Suffix Tree - Functionality


A suffix tree for a string of length can be built in time, if the letters come from an alphabet of integers in a polynomial range (in particular, this is true for constant-sized alphabets). For larger alphabets, the running time is dominated by first sorting the letters to bring them into a range of size ; in general, this takes time. The costs below are given under the assumption that the alphabet is constant.

Assume that a suffix tree has been built for the string of length, or that a generalised suffix tree has been built for the set of strings of total length . You can:

  • Search for strings:
    • Check if a string of length is a substring in time.
    • Find the first occurrence of the patterns of total length as substrings in time.
    • Find all occurrences of the patterns of total length as substrings in time.
    • Search for a regular expression P in time expected sublinear in .
    • Find for each suffix of a pattern, the length of the longest match between a prefix of and a substring in in time. This is termed the matching statistics for .
  • Find properties of the strings:
    • Find the longest common substrings of the string and in time.
    • Find all maximal pairs, maximal repeats or supermaximal repeats in time.
    • Find the Lempel–Ziv decomposition in time.
    • Find the longest repeated substrings in time.
    • Find the most frequently occurring substrings of a minimum length in time.
    • Find the shortest strings from that do not occur in, in time, if there are such strings.
    • Find the shortest substrings occurring only once in time.
    • Find, for each, the shortest substrings of not occurring elsewhere in in time.

The suffix tree can be prepared for constant time lowest common ancestor retrieval between nodes in time. One can then also:

  • Find the longest common prefix between the suffixes and in .
  • Search for a pattern P of length m with at most k mismatches in time, where z is the number of hits.
  • Find all maximal palindromes in ,, or time if gaps of length are allowed, or if mismatches are allowed.
  • Find all tandem repeats in, and k-mismatch tandem repeats in .
  • Find the longest substrings common to at least strings in for in time.
  • Find the longest palindromic substring of a given string (using the suffix trees of both the string and its reverse) in linear time.

Read more about this topic:  Suffix Tree