Multiple Sequence Alignment - Hidden Markov Models

Hidden Markov Models

Hidden Markov models are probabilistic models that can assign likelihoods to all possible combinations of gaps, matches, and mismatches to determine the most likely MSA or set of possible MSAs. HMMs can produce a single highest-scoring output but can also generate a family of possible alignments that can then be evaluated for biological significance. HMMs can produce both global and local alignments. Although HMM-based methods have been developed relatively recently, they offer significant improvements in computational speed, especially for sequences that contain overlapping regions.

Typical HMM-based methods work by representing an MSA as a form of directed acyclic graph known as a partial-order graph, which consists of a series of nodes representing possible entries in the columns of an MSA. In this representation a column that is absolutely conserved (that is, that all the sequences in the MSA share a particular character at a particular position) is coded as a single node with as many outgoing connections as there are possible characters in the next column of the alignment. In the terms of a typical hidden Markov model, the observed states are the individual alignment columns and the "hidden" states represent the presumed ancestral sequence from which the sequences in the query set are hypothesized to have descended. An efficient search variant of the dynamic programming method, known as the Viterbi algorithm, is generally used to successively align the growing MSA to the next sequence in the query set to produce a new MSA. This is distinct from progressive alignment methods because the alignment of prior sequences is updated at each new sequence addition. However, like progressive methods, this technique can be influenced by the order in which the sequences in the query set are integrated into the alignment, especially when the sequences are distantly related.

Several software programs are available in which variants of HMM-based methods have been implemented and which are noted for their scalability and efficiency, although properly using an HMM method is more complex than using more common progressive methods. The simplest is POA (Partial-Order Alignment); a similar but more generalized method is implemented in the packages SAM (Sequence Alignment and Modeling System). and HMMER. SAM has been used as a source of alignments for protein structure prediction to participate in the CASP structure prediction experiment and to develop a database of predicted proteins in the yeast species S. cerevisiae. HHsearch is a software package for the detection of remotely related protein sequences based on the pairwise comparison of HMMs. A server running HHsearch (HHpred) was by far the fastest of the 10 best automatic structure prediction servers in the CASP7 and CASP8 structure prediction competitions.

Read more about this topic:  Multiple Sequence Alignment

Famous quotes containing the words hidden and/or models:

    Opinions are to the vast apparatus of social existence what oil is to machines: one does not go up to a turbine and pour machine oil over it; one applies a little to hidden spindles and joints that one has to know.
    Walter Benjamin (1892–1940)

    Today it is not the classroom nor the classics which are the repositories of models of eloquence, but the ad agencies.
    Marshall McLuhan (1911–1980)