Suffix Tree - External Construction

External Construction

Suffix trees quickly outgrow the main memory on standard machines for sequence collections in the order of gigabytes. As such, their construction calls for external memory approaches.

There are theoretical results for constructing suffix trees in external memory. The algorithm by Farach-Colton, Ferragina & Muthukrishnan (2000) is theoretically optimal, with an I/O complexity equal to that of sorting. However the overall intricacy of this algorithm has prevented, so far, its practical implementation.

On the other hand, there have been practical works for constructing disk-based suffix trees which scale to (few) GB/hours. The state of the art methods are TDD, TRELLIS, DiGeST, and B2ST.

TDD and TRELLIS scale up to the entire human genome – approximately 3GB – resulting in a disk-based suffix tree of a size in the tens of gigabytes,. However, these methods cannot handle efficiently collections of sequences exceeding 3GB. DiGeST performs significantly better and is able to handle collections of sequences in the order of 6GB in about 6 hours. . All these methods can efficiently build suffix trees for the case when the tree does not fit in main memory, but the input does. The most recent method, B2ST, scales to handle inputs that do not fit in main memory. ERA is a recent parallel suffix tree construction method that is significantly faster. ERA can index the entire human genome in 19 minutes on an 8-core desktop computer with 16GB RAM. On a simple Linux cluster with 16 nodes (4GB RAM per node), ERA can index the entire human genome in less than 9 minutes.

Read more about this topic:  Suffix Tree

Famous quotes containing the words external and/or construction:

    When a person hasn’t in him that which is higher and stronger than all external influences, it is enough for him to catch a good cold in order to lose his equilibrium and begin to see an owl in every bird, to hear a dog’s bark in every sound.
    Anton Pavlovich Chekhov (1860–1904)

    No real “vital” character in fiction is altogether a conscious construction of the author. On the contrary, it may be a sort of parasitic growth upon the author’s personality, developing by internal necessity as much as by external addition.
    —T.S. (Thomas Stearns)