Byte Pair Encoding - Byte Pair Encoding Example

Byte Pair Encoding Example

Suppose we wanted to encode the data

aaabdaaabac

The byte pair "aa" occurs most often, so it will be replaced by a byte that is not used in the data, "Z". Now we have the following data and replacement table:

ZabdZabac Z=aa

Then we repeat the process with byte pair "ab", replacing it with Y:

ZYdZYac Y=ab Z=aa

We could stop here, as the only literal byte pair left occurs only once. Or we could continue the process and use recursive byte pair encoding, replacing "ZY" with "X":

XdXac X=ZY Y=ab Z=aa

This data cannot be compressed further by byte pair encoding because there are no pairs of bytes that occur more than once.

To decompress the data, simply perform the replacements in the reverse order.

Read more about this topic:  Byte Pair Encoding

Famous quotes containing the word pair:

    The works of women are symbolical.
    We sew, sew, prick our fingers, dull our sight,
    Producing what? A pair of slippers, sir,
    To put on when you’re weary or a stool
    To stumble over and vex you ... “curse that stool!”
    Or else at best, a cushion, where you lean
    And sleep, and dream of something we are not,
    But would be for your sake. Alas, alas!
    This hurts most, this ... that, after all, we are paid
    The worth of our work, perhaps.
    Elizabeth Barrett Browning (1806–1861)