Tuesday, December 22, 2009

Story behind the story for new #PLoSOne paper on Bayesian phylogenetics


There is an interesting new paper in PLoS One" Long-Branch Attraction Bias and Inconsistency in Bayesian Phylogenetics" by Brian Kolaczkowski and Joseph Thornton. The work focuses on methods for inferring phylogenetic history and in particular two types of statistical approaches: Likelihood and Bayesian.  These methods are related to each other in that both attempt to use statistical models of evolution and then test different possible phylogenetic trees related taxa by how well certain data sets about those taxa map into the different possible trees.  What they did in this new paper was test, with some simulations, and with some mathematical analyses.  And somewhat surprisingly, they find that Bayesian methods, which have become more popular recently, appear to be more prone to errors than likelihood methods, when the data sets have multiple not closely related taxa with long branches.  (Note if you want to learn more about phylogenetic methods, you can look at the online chapter (html format or PDF) from my Evolution Textbook, though I confess this needs a bit of revision, which I am working on now).

What they see in these cases is that the taxa with long branches group together, something known generally as "Long Branch Attraction" (LBA).  Though there have been many previous studies of LBA, most have ended up showing that statistical methods are less prone to this problem than other phylogenetic methods, like distance and parsimony methods. What is surprising in this new work in that they find that Bayesian methods are highly prone to LBA - and much more so than likelihood methods.

Anyway, for more on this one could read the paper.  But that I thought might be interesting is to ask the authors for more detail directly.  I am hoping to do this more and more with PLoS papers in the future. I was inspired to do this, in fact, by one of the authors of this paper, Joe Thornton.  He sent me an email with a link to the paper saying he thought I might be interested in it (true) and that he felt that it was his job in part for a PLoS One paper to make sure it got read by the right audience so he was hoping I might blog about it.  And I said sure, but only if he gave me some of the "story behind the story". So here it is below:

Why did you do these experiments?
Why did we do these experiments? A few years ago, we were studying the behavior of Bayesian posterior probabilities on clades -- whether or not they accurately predict the probability that a clade is true, and what kinds of conditions might cause them to deviate from this ideal. We found that when the true tree was in the Felsenstein zone (two non-sister long branches separated by short branches), the long branches were often incorrectly grouped together with strong support. This was just a small part of a much larger paper that was published in MBE in 2008. The suggestion that Bayesian inference (BI) might be biased in favor of a false tree was surprising and intriguing, because we -- like most people in the field -- had assumed that BI would have the desirable statistical properties of ML (e.g., nearly unbiased inference and statistical consistency -- convergence on the true tree with increasing support as the amount of data grows and the evolutionary model is correct, etc.). So we began doing experiments to rigorously explore the nature of the bias and its causes. When we found that BI was statistically inconsistent and the cause was integrating over branch lengths, we knew this result would be controversial, so we wanted to be sure the experiments were truly airtight. We supplemented our initial simulations with analyses of empirical data, with simulations under a wide variety of conditions using all types of priors, as well as mathematical and numerical analyses to clearly demonstrate the reasons for the bias. We also developed software that was identical to fully Bayesian MCMC except that it does not integrate over branch lengths; this method is not subject to the bias that BI displays, clearly demonstrating the cause of the bias.

Why did you send this to PLoS One?
Why did we submit to PLoS One? We think this paper has profound implications for phylogenetic practice and theory, and we want it to have a wide audience. Our experience with the review process in phylogenetic methods, unfortunately, is that many reviewers evaluate manuscripts based on whether or not the results confirm their world-view. This is a legacy of decades of internecine warfare in the field between the adherents of different methodological camps. We write papers in other fields, and while peer-review always has its ups and downs, our experience in phylogenetics is unusual in that solid papers are often rejected for philosophical reasons rather than for reasons of scientific validity and quality. We know this paper will be controversial, and we didn't want it to be shot down in the review process for partisan reasons. PLoS One seemed like the perfect place to get the paper out and let the scientific community evaluate whether the experiments are convincing or not.
This is our first time publishing in PLoS One. I confess to being a little bit anxious that the paper will be lost in the great tide of papers published in the journal. We know our paper is very strong -- I think it's perhaps the most convincing and complete analysis of any problem I've ever published -- so we're confident that the work can have an impact, as long as the attention of readers in the field is drawn to it.

Where is the other author these days?
Bryan is now a postdoc in Andy Kern's lab at Dartmouth.

Kolaczkowski, B., & Thornton, J. (2009). Long-Branch Attraction Bias and Inconsistency in Bayesian Phylogenetics PLoS ONE, 4 (12) DOI: 10.1371/journal.pone.0007891


  1. Nice review. My account with "research blogging" was just approved, so I am eager to try it out as well.

    On topic, I was just starting to shift over to the BI camp from ML, so this bias is interesting to me.

  2. Well, it may be that the "problems" they see with Bayesian methods are caused more by the implementation of the Bayesian approach, rather than problems with the Bayesian approach per se. So lets reserve judgement until this gets sorted out in more detail perhaps by people more in tune w/ Bayesian methods than, well, me.

  3. I've posted this on dechronization already, so my apologies for cross-posting.

    This issue has been sorted out by Mike Steel, who found an error in one of the Kolaczkowski and Thornton's key equations. Long story short: Bayesian phylogenetic estimation is consistent.

    Given how vocal Joe Thornton was a couple of months ago, I find his silence now a little disingenuous. Maybe he is just busy...

    Kolaczkowski and Thornton's gag inspired me to start a blog, where I plan to document statistical mistakes that we do not want to forget.

  4. Thanks quantifier - had not seen the correction ...