Lyndon Word - Standard Factorization

Standard Factorization

According to the Chen–Fox–Lyndon theorem, every string may be formed in a unique way by concatenating a sequence of Lyndon words, in such a way that the words in the sequence are nonincreasing lexicographically. The final Lyndon word in this sequence is the lexicographically smallest suffix of the given string. A factorization into a nonincreasing sequence of Lyndon words (a so-called Lyndon factorization) may be constructed in linear time. Lyndon factorizations may be used as part of a bijective variant of the Burrows–Wheeler transform for data compression, and in algorithms for digital geometry.

Duval (1983) developed an algorithm for standard factorization that runs in linear time and constant space. It iterates over a string trying to find as long Lyndon word as possible. When it finds one, it adds it to the result list and proceeds to search in the remaining part of string. Resulting list of strings is standard factorization of a given string. More formal description of algorithm follows.

Given a string S of length N, one should proceed with the following steps:

  1. Let m be the index of the symbol-candidate to be appended to already collected ones. Initially, m = 1 (counting of symbols in a string starts from zero).
  2. Let k be the index of the symbol we would compare others to. Initially, k = 0.
  3. While k and m is less than N, compare S (kth symbol of a string S) to S. There are three possible outcomes:
    1. S is equal to S: append S to the current collected symbols. Increment k and m.
    2. S is less than S: if we append S to the current collected symbols we'll get Lyndon word. But we can't add it to the result list yet because it may be just a part of a larger Lyndon word. Thus, just increment m and set k to 0 so the next symbol would be compared to the first one in the string.
    3. S is greater than S: if we append S to the current collected symbols, it won't be neither Lyndon word nor the possible beginning of one. Thus, add first m-k collected symbols to the result list, remove them from the string, set m to 1 and k to 0 so they point to the second and first symbol of the string respectively.
  4. Add S to the result list.

Read more about this topic:  Lyndon Word

Famous quotes containing the word standard:

    If the Revolution has the right to destroy bridges and art monuments whenever necessary, it will stop still less from laying its hand on any tendency in art which, no matter how great its achievement in form, threatens to disintegrate the revolutionary environment or to arouse the internal forces of the Revolution, that is, the proletariat, the peasantry and the intelligentsia, to a hostile opposition to one another. Our standard is, clearly, political, imperative and intolerant.
    Leon Trotsky (1879–1940)