Problems With Maximum Parsimony Phylogeny Estimation
Maximum parsimony is a very simple approach, and is popular for this reason. However, it is not statistically consistent. That is, it is not guaranteed to produce the true tree with high probability, given sufficient data. Consistency, here meaning the monotonic convergence on the correct answer with the addition of more data, is a desirable property of any statistical method. As demonstrated in 1978 by Joe Felsenstein, maximum parsimony can be inconsistent under certain conditions. The category of situations in which this is known to occur is called long branch attraction, and occurs, for example, where there are long branches (a high level of substitutions) for two characters (A & C), but short branches for another two (B & D). A and B diverged from a common ancestor, as did C and D.
Assume for simplicity that we are considering a single binary character (it can either be + or -). Because the distance from B to D is small, in the vast majority of all cases, B and D will be the same. Here, we will assume that they are both + (+ and - are assigned arbitrarily and swapping them is only a matter of definition). If this is the case, there are four remaining possibilities. A and C can both be +, in which case all taxa are the same and all the trees have the same length. A can be + and C can be -, in which case only one character is different, and we cannot learn anything, as all trees have the same length. Similarly, A can be - and C can be +. The only remaining possibility is that A and C are both -. In this case, however, we group A and C together, and B and D together. As a consequence, when we have a tree of this type, the more data we collect (i.e. the more characters we study), the more we tend towards the wrong tree.
A simple and effective method for determining whether or not long branch attraction is affecting tree topology is the SAW method, named for Siddal and Whiting. If long branch attraction is suspected in a pair of taxa (A and B), simply remove taxon A ("saw" off the branch) and re-run the analysis. Then remove A and replace B, running the analysis again. If either of the taxa appear at different branch points in the absence of the other, there is evidence of long branch attraction. Since long branches can't possibly attract one another when only one is in the analysis, consistent taxon placement between treatments would indicate long branch attraction is not a problem.
Several other methods of phylogeny estimation are available, including maximum likelihood, Bayesian phylogeny inference, neighbour joining, and quartet methods. Of these, the first two both use a likelihood function, and, if used properly, are theoretically immune to long-branch attraction. These methods are both parametric, meaning that they rely on an explicit model of character evolution. It has been shown that, for some suboptimal models, these methods can also be inconsistent.
Another complication with maximum parsimony is that finding the most parsimonious tree is an NP-Hard problem. The only currently available, efficient way of obtaining a solution, given an arbitrarily large set of taxa, is by using heuristic methods which do not guarantee that the most parsimonious tree will be recovered. These methods employ hill-climbing algorithms to progressively approach the best tree. However, it has been shown that there can be "tree islands" of suboptimal solutions, and the analysis can become trapped in these local optima. Thus, complex, flexible heuristics are required to ensure that tree space has been adequately explored. Several heuristics are available, including nearest neighbor interchange (NNI), tree bisection reconnection (TBR), and the phylogenetic ratchet. This problem is certainly not unique to MP; any method that uses an optimality criterion faces the same problem, and none offer easy solutions.
Read more about this topic: Maximum Parsimony (phylogenetics)
Famous quotes containing the words problems with, problems, maximum, parsimony and/or estimation:
“I was a wonderful parent before I had children. I was an expert on why everyone else was having problems with theirs. Then I had three of my own.”
—Adele Faber (20th century)
“Nothing in the world can take the place of Persistence. Talent will not; nothing is more common than unsuccessful men with talent. Genius will not; unrewarded genius is almost a proverb. Education will not; the world is full of educated derelicts. Persistence and Determination alone are omnipotent. The slogan Press On, has solved and will always solve the problems of the human race.”
—Calvin Coolidge (18721933)
“I had a quick grasp of the secret to sanityit had become the ability to hold the maximum of impossible combinations in ones mind.”
—Norman Mailer (b. 1923)
“There is nothing worse than being ashamed of parsimony or poverty.”
—Titus Livius (Livy)
“... it would be impossible for women to stand in higher estimation than they do here. The deference that is paid to them at all times and in all places has often occasioned me as much surprise as pleasure.”
—Frances Wright (17951852)