Snappy (software) - Stream Format

Stream Format

Snappy encoding is not bit-oriented, but byte-oriented (only whole bytes are emitted or consumed from a stream). The format uses no entropy encoder, like Huffman tree or arithmetic encoder.

The first bytes of the stream are the length of uncompressed data, stored as a little-endian varint, which allows for variable-length encoding. The lower seven bits of each byte are used for data and the first bit is a flag which tells if the next byte is used for the same integer.

The remaining bytes in the stream are encoded using one of four element types. The element type is encoded in the first byte (tag byte) of the element. The two lower bits of this byte is the type code:

  • 00 – Literal – uncompressed data; upper 6 bits are used to store length of data; if the length of data is more 60 bytes, additional variable-length encoding is added
  • 01 – Copy with length stored as 3 bits and offset stored as 11 bits; one byte after tag byte is used for part of offset;
  • 10 – Copy with length stored as 6 bits of tag byte and offset stored as two-byte integer after the tag byte;
  • 11 – Copy with length stored as 6 bits of tag byte and offset stored as four-byte little-endian integer after the tag byte;

The copy refers to the dictionary (or just the decompressed data). The offset is the shift from the current position back to the already decompressed stream. The length is the number of bytes to copy from the dictionary. The size of the dictionary is limited by the current Snappy compressor to 32768 bytes.

Read more about this topic:  Snappy (software)

Famous quotes containing the word stream:

    I have ventured
    Like little wanton boys that swim on bladders,
    This many summers in a sea of glory,
    But far beyond my depth. My high-blown pride
    At length broke under me and now has left me,
    Weary and old with service, to the mercy
    Of a rude stream that must forever hide me.
    William Shakespeare (1564–1616)