Fragmentation (computing) - Data Fragmentation

Data fragmentation occurs when a collection of data in memory is broken up into many pieces that are not close together. It is typically the result of attempting to insert a large object into storage that has already suffered external fragmentation.

For example, files in a file system are usually managed in units called blocks or clusters. When a file system is created, there is free space to store file blocks together contiguously. This allows for rapid sequential file reads and writes. However, as files are added, removed, and changed in size, the free space becomes externally fragmented, leaving only small holes in which to place new data. When a new file is written, or when an existing file is extended, the operating system puts the new data in new non-contiguous data blocks to fit into the available holes. The new data blocks are necessarily scattered, slowing access due to seek time and rotational latency of the read/write head, and incurring additional overhead to manage additional locations. This is called file system fragmentation.

When writing a new file of a known size, if there are any empty holes that are larger than that file, the operating system can avoid data fragmentation by putting the file into any one of those holes. There are a variety of algorithms for selecting which of those potential holes to put the file; each of them is a heuristic approximate solution to the bin packing problem. The "best fit" algorithm chooses the smallest hole that is big enough. The "worst fit" algorithm chooses the largest hole. The "first-fit algorithm" chooses the first hole that is big enough. The "next fit" algorithm keeps track of where each file was written. The "next fit" algorithm is faster than "first fit", which is in turn faster than "best fit", which is the same speed as "worst fit".

Just as compaction can eliminate external fragmentation, data fragmentation can be eliminated by rearranging data storage so that related pieces are close together. For example, the primary job of a defragmentation tool is to rearrange blocks on disk so that the blocks of each file are contiguous. Most defragmenting utilities also attempt to reduce or eliminate free space fragmentation. Some moving garbage collectors will also move related objects close together (this is called compacting) to improve cache performance.

There are 4 kinds of systems that never experience data fragmentation -- they always store every file contiguously. Alas, all 4 kinds have significant disadvantages compared to systems that allow at least some temporary data fragmentation:

  • Simply write each file contiguously, as with CD-R. If there isn't already enough contiguous free space to hold the file, the system immediately fails to store the file -- even when there are lots of little bits of free space from deleted files that add up to more than enough to store the file.
  • If there isn't already enough contiguous free space to hold the file, use a copying collector to convert many little bits of free space into one contiguous free region big enough to hold the file. This takes a lot more time than breaking the file up into fragments and putting those fragments into the available free space.
  • fixed-size-blocks allocation: write the file into any free block. If a programmer picks a fixed block size too small, the system immediately fails to store some files -- files larger than the block size -- even when there are many free blocks that add up to more than enough to store the file. If a programmer picks a block size too big, we waste a lot of space on internal fragmentation.
  • Some systems avoid dynamic allocation entirely, pre-allocating (contiguous) space for all possible files they will need -- for example, MultiFinder pre-allocated a chunk of RAM to each application as it was started according to how much RAM that application's programmer claimed it would need.

Read more about this topic:  Fragmentation (computing)

Famous quotes containing the word data:

    This city is neither a jungle nor the moon.... In long shot: a cosmic smudge, a conglomerate of bleeding energies. Close up, it is a fairly legible printed circuit, a transistorized labyrinth of beastly tracks, a data bank for asthmatic voice-prints.
    Susan Sontag (b. 1933)