Bit Array - Compression

Compression

Large bit arrays tend to have long streams of zeroes or ones. This phenomenon wastes storage and processing time. Run-length encoding is commonly used to compress these long streams. However, by compressing bit arrays too aggressively we run the risk of losing the benefits due to bit-level parallelism (vectorization). Thus, instead of compressing bit arrays as streams of bits, we might compress them as streams bytes or words (see Bitmap index (compression)).

The specific compression technique and implementation details can effect performance. Thus, it might be helpful in practice to benchmark the various implementations.

Examples:

  • compressedbitset: WAH Compressed BitSet for Java
  • javaewah: A compressed alternative to the Java BitSet class (using Enhanced WAH)
  • CONCISE: COmpressed 'N' Composable Integer Set, another bitmap compression scheme for Java
  • EWAHBoolArray: A compressed bitmap/bitset class in C++
  • CSharpEWAH: compressed bitset class in C#
  • SparseBitmap: a simple sparse bitmap implementation in Java

Read more about this topic:  Bit Array

Famous quotes containing the word compression:

    Do they [the publishers of Murphy] not understand that if the book is slightly obscure it is because it is a compression and that to compress it further can only make it more obscure?
    Samuel Beckett (1906–1989)

    The triumphs of peace have been in some proximity to war. Whilst the hand was still familiar with the sword-hilt, whilst the habits of the camp were still visible in the port and complexion of the gentleman, his intellectual power culminated; the compression and tension of these stern conditions is a training for the finest and softest arts, and can rarely be compensated in tranquil times, except by some analogous vigor drawn from occupations as hardy as war.
    Ralph Waldo Emerson (1803–1882)