External Memory Algorithms - Data Organization and Representation

Data Organization and Representation

A modern digital computer represents data using the binary numeral system. Text, numbers, pictures, audio, and nearly any other form of information can be converted into a string of bits, or binary digits, each of which has a value of 1 or 0. The most common unit of storage is the byte, equal to 8 bits. A piece of information can be handled by any computer or device whose storage space is large enough to accommodate the binary representation of the piece of information, or simply data. For example, the complete works of Shakespeare, about 1250 pages in print, can be stored in about five megabytes (40 million bits) with one byte per character.

Data is encoded by assigning a bit pattern to each character, digit, or multimedia object. Many standards exist for encoding (e.g., character encodings like ASCII, image encodings like JPEG, video encodings like MPEG-4). By adding bits to each encoded unit, the redundancy allows both to detect errors in coded data and to correct them based on mathematical algorithms. Errors occur regularly in low probabilities due to random bit value flipping, or "physical bit fatigue," loss of the physical bit in storage its ability to maintain distinguishable value (0 or 1), or due to errors in inter or intra-computer communication. A random bit flip (e.g., due to random radiation) is typically corrected upon detection. A bit, or a group of malfunctioning physical bits (not always the specific defective bit is known; group definition depends on specific storage device) is typically automatically fenced-out, taken out of use by the device, and replaced with another functioning equivalent group in the device, where the corrected bit values are restored (if possible). The cyclic redundancy check (CRC) method is typically used in storage for error detection and correction.

Data compression methods allow in many cases to represent a string of bits by a shorter bit string ("compress") and reconstruct the original string ("decompress") when needed. This allows to utilize substantially less storage (tens of percents) for many types of data at the cost of more computation (compress and decompress when needed). Analysis of trade-off between storage cost saving and costs of related computations and possible delays in data availability is done before deciding whether to keep certain data in a database compressed or not.

For security reasons certain types of data (e.g., credit-card information) may be kept encrypted in storage to prevent the possibility of unauthorized information reconstruction from chunks of storage snapshots.

Read more about this topic:  External Memory Algorithms

Famous quotes containing the words data and/or organization:

    To write it, it took three months; to conceive it three minutes; to collect the data in it—all my life.
    F. Scott Fitzgerald (1896–1940)

    The organization controlling the material equipment of our everyday life is such that what in itself would enable us to construct it richly plunges us instead into a poverty of abundance, making alienation all the more intolerable as each convenience promises liberation and turns out to be only one more burden. We are condemned to slavery to the means of liberation.
    Raoul Vaneigem (b. 1934)