Maximum Parsimony (phylogenetics) - Problems With Maximum Parsimony Phylogeny Estimation

Problems With Maximum Parsimony Phylogeny Estimation

Maximum parsimony is a very simple approach, and is popular for this reason. However, it is not statistically consistent. That is, it is not guaranteed to produce the true tree with high probability, given sufficient data. Consistency, here meaning the monotonic convergence on the correct answer with the addition of more data, is a desirable property of any statistical method. As demonstrated in 1978 by Joe Felsenstein, maximum parsimony can be inconsistent under certain conditions. The category of situations in which this is known to occur is called long branch attraction, and occurs, for example, where there are long branches (a high level of substitutions) for two characters (A & C), but short branches for another two (B & D). A and B diverged from a common ancestor, as did C and D.

Assume for simplicity that we are considering a single binary character (it can either be + or -). Because the distance from B to D is small, in the vast majority of all cases, B and D will be the same. Here, we will assume that they are both + (+ and - are assigned arbitrarily and swapping them is only a matter of definition). If this is the case, there are four remaining possibilities. A and C can both be +, in which case all taxa are the same and all the trees have the same length. A can be + and C can be -, in which case only one character is different, and we cannot learn anything, as all trees have the same length. Similarly, A can be - and C can be +. The only remaining possibility is that A and C are both -. In this case, however, we group A and C together, and B and D together. As a consequence, when we have a tree of this type, the more data we collect (i.e. the more characters we study), the more we tend towards the wrong tree.

A simple and effective method for determining whether or not long branch attraction is affecting tree topology is the SAW method, named for Siddal and Whiting. If long branch attraction is suspected in a pair of taxa (A and B), simply remove taxon A ("saw" off the branch) and re-run the analysis. Then remove A and replace B, running the analysis again. If either of the taxa appear at different branch points in the absence of the other, there is evidence of long branch attraction. Since long branches can't possibly attract one another when only one is in the analysis, consistent taxon placement between treatments would indicate long branch attraction is not a problem.

Several other methods of phylogeny estimation are available, including maximum likelihood, Bayesian phylogeny inference, neighbour joining, and quartet methods. Of these, the first two both use a likelihood function, and, if used properly, are theoretically immune to long-branch attraction. These methods are both parametric, meaning that they rely on an explicit model of character evolution. It has been shown that, for some suboptimal models, these methods can also be inconsistent.

Another complication with maximum parsimony is that finding the most parsimonious tree is an NP-Hard problem. The only currently available, efficient way of obtaining a solution, given an arbitrarily large set of taxa, is by using heuristic methods which do not guarantee that the most parsimonious tree will be recovered. These methods employ hill-climbing algorithms to progressively approach the best tree. However, it has been shown that there can be "tree islands" of suboptimal solutions, and the analysis can become trapped in these local optima. Thus, complex, flexible heuristics are required to ensure that tree space has been adequately explored. Several heuristics are available, including nearest neighbor interchange (NNI), tree bisection reconnection (TBR), and the phylogenetic ratchet. This problem is certainly not unique to MP; any method that uses an optimality criterion faces the same problem, and none offer easy solutions.

Read more about this topic:  Maximum Parsimony (phylogenetics)

Famous quotes containing the words problems, maximum, parsimony and/or estimation:

    The truth of the thoughts that are here set forth seems to me unassailable and definitive. I therefore believe myself to have found, on all essential points, the final solution of the problems. And if I am not mistaken in this belief, then the second thing in which the value of this work consists is that it shows how little is achieved when these problems are solved.
    Ludwig Wittgenstein (1889–1951)

    Only at his maximum does an individual surpass all his derivative elements, and become purely himself. And most people never get there. In his own pure individuality a man surpasses his father and mother, and is utterly unknown to them.
    —D.H. (David Herbert)

    There is nothing worse than being ashamed of parsimony or poverty.
    Titus Livius (Livy)

    A higher class, in the estimation and love of this city- building, market-going race of mankind, are the poets, who, from the intellectual kingdom, feed the thought and imagination with ideas and pictures which raise men out of the world of corn and money, and console them for the short-comings of the day, and the meanness of labor and traffic.
    Ralph Waldo Emerson (1803–1882)