Dictionary Coder - Examples

Examples

Example: The text to be compressed starts "HURLYBURLY". In the first six steps of the encoding, we output the indexes for H, U, R, L, Y, and B, and we add to the dictionary new entries for HU, UR, RL, LY, YB, and BU. On the seventh step, we are at the start of "URLY"; the longest entry in our dictionary that matches the text is "UR", an entry we added on the second step. We send the index for "UR" to the output, and add an entry for "URL" to the dictionary. On the eighth step, we send the index for "LY" to the output, and assuming that a space follows HURLYBURLY in the text, we add an entry for "LY " to the dictionary. If later in the text we were to encounter the word "HURLYBURLY" again, we could encode it this time (assuming we started at the H) in no more than five indexes:- HU, RL, YB, URL, and Y.

The LZ78 decoder receives each symbol and, if it already has a previous prefix, adds the prefix plus the symbol to its own separate dictionary. It then outputs the symbol and sets the prefix to the last character of the symbol. One "gotcha" here is that if the encoder sees a sequence of the form STRING STRING CHARACTER, where STRING is currently in the dictionary, it will output a symbol that is one higher than the decoder's last dictionary entry. The decoder must detect such an event and output the previous symbol plus its first character. This symbol will always be only one higher than the last numbered symbol in the decoder's dictionary.

Encoding HURLYBURLY
H H
U HU
R UR
L RL
Y LY
B YB
U BU
R -- (UR is in the dictionary already)
L URL
Y RLY (doesn't matter)

Example: The encoder is encoding BANANANANA; after outputting the indexes for B, A, N and AN the encoder has in its dictionary entries for BA, AN, NA, and ANA and the decoder has entries for BA, AN, and NA. The encoder can match "ANA" so it sends the index for "ANA" and adds "ANAN" to the dictionary. However, the decoder doesn't have "ANA" in its dictionary. It must guess that this new symbol is the prefix (the last symbol it received, "AN") plus its first character ("A"). It then outputs "ANA" and adds the prefix plus the last character of the output ("A" again) to the dictionary. Decoding can continue from there.

Another dictionary coding scheme is byte pair encoding, where a byte that does not appear in the source text is assigned to represent the most commonly appearing two-byte combination. This can be done repeatedly as long as there are bytes that do not appear in the source text, and bytes that are already representing combinations of other bytes can themselves appear in combinations.

Read more about this topic:  Dictionary Coder

Famous quotes containing the word examples:

    It is hardly to be believed how spiritual reflections when mixed with a little physics can hold people’s attention and give them a livelier idea of God than do the often ill-applied examples of his wrath.
    —G.C. (Georg Christoph)

    In the examples that I here bring in of what I have [read], heard, done or said, I have refrained from daring to alter even the smallest and most indifferent circumstances. My conscience falsifies not an iota; for my knowledge I cannot answer.
    Michel de Montaigne (1533–1592)

    Histories are more full of examples of the fidelity of dogs than of friends.
    Alexander Pope (1688–1744)