Multiple Sequence Alignment - Hidden Markov Models

Hidden Markov Models

Hidden Markov models are probabilistic models that can assign likelihoods to all possible combinations of gaps, matches, and mismatches to determine the most likely MSA or set of possible MSAs. HMMs can produce a single highest-scoring output but can also generate a family of possible alignments that can then be evaluated for biological significance. HMMs can produce both global and local alignments. Although HMM-based methods have been developed relatively recently, they offer significant improvements in computational speed, especially for sequences that contain overlapping regions.

Typical HMM-based methods work by representing an MSA as a form of directed acyclic graph known as a partial-order graph, which consists of a series of nodes representing possible entries in the columns of an MSA. In this representation a column that is absolutely conserved (that is, that all the sequences in the MSA share a particular character at a particular position) is coded as a single node with as many outgoing connections as there are possible characters in the next column of the alignment. In the terms of a typical hidden Markov model, the observed states are the individual alignment columns and the "hidden" states represent the presumed ancestral sequence from which the sequences in the query set are hypothesized to have descended. An efficient search variant of the dynamic programming method, known as the Viterbi algorithm, is generally used to successively align the growing MSA to the next sequence in the query set to produce a new MSA. This is distinct from progressive alignment methods because the alignment of prior sequences is updated at each new sequence addition. However, like progressive methods, this technique can be influenced by the order in which the sequences in the query set are integrated into the alignment, especially when the sequences are distantly related.

Several software programs are available in which variants of HMM-based methods have been implemented and which are noted for their scalability and efficiency, although properly using an HMM method is more complex than using more common progressive methods. The simplest is POA (Partial-Order Alignment); a similar but more generalized method is implemented in the packages SAM (Sequence Alignment and Modeling System). and HMMER. SAM has been used as a source of alignments for protein structure prediction to participate in the CASP structure prediction experiment and to develop a database of predicted proteins in the yeast species S. cerevisiae. HHsearch is a software package for the detection of remotely related protein sequences based on the pairwise comparison of HMMs. A server running HHsearch (HHpred) was by far the fastest of the 10 best automatic structure prediction servers in the CASP7 and CASP8 structure prediction competitions.

Read more about this topic:  Multiple Sequence Alignment

Famous quotes containing the words hidden and/or models:

    When a thing is funny, search it for a hidden truth.
    George Bernard Shaw (1856–1950)

    The parents who wish to lead a quiet life I would say: Tell your children that they are very naughty—much naughtier than most children; point to the young people of some acquaintances as models of perfection, and impress your own children with a deep sense of their own inferiority. You carry so many more guns than they do that they cannot fight you. This is called moral influence and it will enable you to bounce them as much as you please.
    Samuel Butler (1835–1902)