Data Conversion - Lost and Inexact Data Conversion

Lost and Inexact Data Conversion

The objective of data conversion is to maintain all of the data, and as much of the embedded information as possible. This can only be done if the target format supports the same features and data structures present in the source file. Conversion of a word processing document to a plain text file necessarily involves loss of formatting information, because plain text format does not support word processing constructs such as marking a word as boldface. For this reason, conversion from one format to one which does not support a feature which is important to the user is rarely carried out, though it may be necessary for interoperability, e.g. converting a file from one version of Microsoft Word to an earlier version to enable transfer and use by other users who do not have the same later version of Word installed on their computer.

Loss of information can be mitigated by approximation in the target format. There is no way of converting a character like ä to ASCII, since the ASCII standard lacks it, but the information may be retained by approximating the character as ae. Of course, this is not an optimal solution, and can impact operations like searching and copying; and if a language makes a distinction between ä and ae, then that approximation does involve loss of information.

Data conversion can also suffer from inexactitude, the result of converting between formats that are conceptually different. The WYSIWYG paradigm, extant in word processors and desktop publishing applications, versus the structural-descriptive paradigm, found in SGML, XML and many applications derived therefrom, like HTML and MathML, is one example. Using a WYSIWYG HTML editor conflates the two paradigms, and the result is HTML files with suboptimal, if not nonstandard, code. In the WYSIWYG paradigm a double linebreak signifies a new paragraph, as that is the visual cue for such a construct, but a WYSIWYG HTML editor will usually convert such a sequence to

, which is structurally no new paragraph at all. As another example, converting from PDF to an editable word processor format is a tough chore, because PDF records the textual information like engraving on stone, with each character given a fixed position and linebreaks hard-coded, whereas word processor formats accommodate text reflow. PDF does not know of a word space character—the space between two letters and the space between two words differ only in quantity. Therefore, a title with ample letter-spacing for effect will usually end up with spaces in the word processor file, for example INTRODUCTION with spacing of 1 em as I N T R O D U C T I O N on the word processor.

Read more about this topic:  Data Conversion

Famous quotes containing the words lost and, lost, inexact, data and/or conversion:

    And get the fatted calf and kill it, and let us eat and celebrate; for this son of mine was dead and is alive again; he was lost and is found!’ And they began to celebrate.
    Bible: New Testament, Luke 15:23,24.

    Was there ever a cause too lost,
    Ever a cause that was lost too long....
    Robert Frost (1874–1963)

    Thanks to recent trends in the theory of knowledge, history is now better aware of its own worth and unassailability than it formerly was. It is precisely in its inexact character, in the fact that it can never be normative and does not have to be, that its security lies.
    Johan Huizinga (1872–1945)

    Mental health data from the 1950’s on middle-aged women showed them to be a particularly distressed group, vulnerable to depression and feelings of uselessness. This isn’t surprising. If society tells you that your main role is to be attractive to men and you are getting crow’s feet, and to be a mother to children and yours are leaving home, no wonder you are distressed.
    Grace Baruch (20th century)

    The conversion of a savage to Christianity is the conversion of Christianity to savagery.
    George Bernard Shaw (1856–1950)