External Memory Algorithms - Data Organization and Representation

Data Organization and Representation

A modern digital computer represents data using the binary numeral system. Text, numbers, pictures, audio, and nearly any other form of information can be converted into a string of bits, or binary digits, each of which has a value of 1 or 0. The most common unit of storage is the byte, equal to 8 bits. A piece of information can be handled by any computer or device whose storage space is large enough to accommodate the binary representation of the piece of information, or simply data. For example, the complete works of Shakespeare, about 1250 pages in print, can be stored in about five megabytes (40 million bits) with one byte per character.

Data is encoded by assigning a bit pattern to each character, digit, or multimedia object. Many standards exist for encoding (e.g., character encodings like ASCII, image encodings like JPEG, video encodings like MPEG-4). By adding bits to each encoded unit, the redundancy allows both to detect errors in coded data and to correct them based on mathematical algorithms. Errors occur regularly in low probabilities due to random bit value flipping, or "physical bit fatigue," loss of the physical bit in storage its ability to maintain distinguishable value (0 or 1), or due to errors in inter or intra-computer communication. A random bit flip (e.g., due to random radiation) is typically corrected upon detection. A bit, or a group of malfunctioning physical bits (not always the specific defective bit is known; group definition depends on specific storage device) is typically automatically fenced-out, taken out of use by the device, and replaced with another functioning equivalent group in the device, where the corrected bit values are restored (if possible). The cyclic redundancy check (CRC) method is typically used in storage for error detection and correction.

Data compression methods allow in many cases to represent a string of bits by a shorter bit string ("compress") and reconstruct the original string ("decompress") when needed. This allows to utilize substantially less storage (tens of percents) for many types of data at the cost of more computation (compress and decompress when needed). Analysis of trade-off between storage cost saving and costs of related computations and possible delays in data availability is done before deciding whether to keep certain data in a database compressed or not.

For security reasons certain types of data (e.g., credit-card information) may be kept encrypted in storage to prevent the possibility of unauthorized information reconstruction from chunks of storage snapshots.

Read more about this topic:  External Memory Algorithms

Famous quotes containing the words data and/or organization:

    Mental health data from the 1950’s on middle-aged women showed them to be a particularly distressed group, vulnerable to depression and feelings of uselessness. This isn’t surprising. If society tells you that your main role is to be attractive to men and you are getting crow’s feet, and to be a mother to children and yours are leaving home, no wonder you are distressed.
    Grace Baruch (20th century)

    Science, unguided by a higher abstract principle, freely hands over its secrets to a vastly developed and commercially inspired technology, and the latter, even less restrained by a supreme culture saving principle, with the means of science creates all the instruments of power demanded from it by the organization of Might.
    Johan Huizinga (1872–1945)