Lyndon Word - Standard Factorization

Standard Factorization

According to the Chen–Fox–Lyndon theorem, every string may be formed in a unique way by concatenating a sequence of Lyndon words, in such a way that the words in the sequence are nonincreasing lexicographically. The final Lyndon word in this sequence is the lexicographically smallest suffix of the given string. A factorization into a nonincreasing sequence of Lyndon words (a so-called Lyndon factorization) may be constructed in linear time. Lyndon factorizations may be used as part of a bijective variant of the Burrows–Wheeler transform for data compression, and in algorithms for digital geometry.

Duval (1983) developed an algorithm for standard factorization that runs in linear time and constant space. It iterates over a string trying to find as long Lyndon word as possible. When it finds one, it adds it to the result list and proceeds to search in the remaining part of string. Resulting list of strings is standard factorization of a given string. More formal description of algorithm follows.

Given a string S of length N, one should proceed with the following steps:

  1. Let m be the index of the symbol-candidate to be appended to already collected ones. Initially, m = 1 (counting of symbols in a string starts from zero).
  2. Let k be the index of the symbol we would compare others to. Initially, k = 0.
  3. While k and m is less than N, compare S (kth symbol of a string S) to S. There are three possible outcomes:
    1. S is equal to S: append S to the current collected symbols. Increment k and m.
    2. S is less than S: if we append S to the current collected symbols we'll get Lyndon word. But we can't add it to the result list yet because it may be just a part of a larger Lyndon word. Thus, just increment m and set k to 0 so the next symbol would be compared to the first one in the string.
    3. S is greater than S: if we append S to the current collected symbols, it won't be neither Lyndon word nor the possible beginning of one. Thus, add first m-k collected symbols to the result list, remove them from the string, set m to 1 and k to 0 so they point to the second and first symbol of the string respectively.
  4. Add S to the result list.

Read more about this topic:  Lyndon Word

Famous quotes containing the word standard:

    Error is a supposition that pleasure and pain, that intelligence, substance, life, are existent in matter. Error is neither Mind nor one of Mind’s faculties. Error is the contradiction of Truth. Error is a belief without understanding. Error is unreal because untrue. It is that which seemeth to be and is not. If error were true, its truth would be error, and we should have a self-evident absurdity—namely, erroneous truth. Thus we should continue to lose the standard of Truth.
    Mary Baker Eddy (1821–1910)