This is a guest post in my series "The Story Behind the Paper". Post is by James O. Dwyer about his paper (coauthored with Steven Kembel and Tom Sharpton) in PNAS entitled "Backbones of evolutionary history test biodiversity theory for microbes
Backbones of evolutionary history test biodiversity theory for microbes
Prehistory
This paper has its roots going back a few years, and it
all started off fairly innocuously. A previous collaboration with Steve
Kembel and Jessica Green resulted in this earlier paper, where we had the lofty goal of
encouraging microbial ecologists to throw out slightly less data, and also
attracted Jonathan’s attention for our microbiome figures. One of the central questions
in ecology is to explain and understand patterns of biodiversity: for example,
by quantifying the diversity of a local community (“alpha” diversity), or
similarity between multiple local communities (“beta” diversity). In microbial ecology it is common to use
evolutionary history to quantify these measures. But both phylogenetic alpha
and beta-diversity tend to change systematically with increasing sample size,
making it difficult to compare results for samples of different sizes.
Our idea in the earlier paper was to generate a fast way
to compute a null prediction for these metrics for phylogenetic alpha and beta
diversity—i.e. this would provide a way to standardize the results for sample
size, and hence we could use full samples rather than smaller, rarefied
samples. The solution is relatively simple, and involved a phylogenetic
analogue of the Species Abundance Distribution (SAD), which we called the
Edge-length Abundance Distribution (EAD). In comparison with the SAD,
this distribution replaces species units with subclades of a phylogenetic tree,
replaces species abundances with subclade size, and inserts branch length
weightings in a specific way.
The present day
Job done. So how did this lead to a new paper?
Well, this first study generated something slightly mysterious to us. In
theory, the EADs we computed from empirical data could have taken any form they
wanted to—and yet for various microbiome habitats, they all seemed to display a
very distinct power law scaling. Translated into a more concrete consequence, the
form of the EAD was such that phylogenetic diversity typically increased as a
power law function of sample size. There’s a history in ecology of
looking for (and sometimes finding) behavior that both takes on a power law
scaling, and is also universal across multiple systems, fitting with a general
sense that some patterns may be emergent and independent of much of the
underlying variation between communities. There’s also a history of looking for
(and sometimes finding) power law scaling in evolutionary trees, for example in
the number of species per genus, which has often been claimed as a power law.
Here we had found a link with these older ideas, with a nice combination of
new factors. First, we weren’t relying on human definitions of species,
which could certainly be biased towards generating power law scaling
artificially (e.g., the principle of balance). Second, we had large
numbers, so that these scaling behaviors spread over multiple orders of
magnitude. Third, there was an untapped world of microbial sequence data
to look at to see whether these patterns extended into microbiology.
With Tom and Steve, we combined these ideas to
set up the empirical side of this new paper: expand the original study across a
broader range of habitats, test whether the patterns are robust to different
alignment and inference methods, and see whether the same scaling behavior holds
up for this new range of samples. Which indeed it did---Figures 1 and 3
in the new paper show that this power law scaling is present across multiple
microbial habitats.
Just knowing that this distribution takes a power law form is already useful on its own, because (again) it defines the null expectations for the way phylogenetic alpha and beta diversity change with sample size. But these results still left a number of open questions, centering around whether this could also give us some insight into what models of biodiversity could be consistent with what we were seeing. Could these scaling patterns provide evidence for whether a given ecological and evolutionary scenarios had strongly influenced a community?
Just knowing that this distribution takes a power law form is already useful on its own, because (again) it defines the null expectations for the way phylogenetic alpha and beta diversity change with sample size. But these results still left a number of open questions, centering around whether this could also give us some insight into what models of biodiversity could be consistent with what we were seeing. Could these scaling patterns provide evidence for whether a given ecological and evolutionary scenarios had strongly influenced a community?
Coarse-graining: reducing the resolution of phylogenetic trees
The first modeling approach we considered is neutral theory.
Neutral models have provided the basic null models in fields stretching from
population genetics and ecology to cultural evolution and the
social sciences. In common is the key assumption that selective
differences are irrelevant for predicting large-scale patterns. If the power law scaling is just an inevitably--an ecological version of Benford's law--it seemed likely that it
might be just a consequence of neutrality, with all of the variation and
mechanism somehow washing out. Is it possible that these observed phylogenetic
patterns are driven by this most basic, neutral model of biodiversity?
The answer turns out to be no---at least using the vanilla version of the
neutral theory, we don't reproduce these scaling behaviors.
Next, we got a little creative. When working with trees
generated by neutral processes, we were thinking of the Kingman coalescent. I.e. a
model of tree structure that works backwards in time, coalescing pairs of lineages at each node.
There's a one-parameter family of coalescent models generalizing the
Kingman coalescent, with the unifying feature that more than two lineages can
coalesce at each node. Viewed forward in time, one lineage can burst into many.
This generalized family, the Lambda-coalescent, produces precisely the power
law EAD (known in that context as a site-frequency spectrum) we were looking for.
These generalized coalescent trees have previously
been used to understand
population processes with a skewed offspring distribution, where there is a significant probability that an organism has a large number of offspring, and this matches the idea of multiple lineages coalescing. But for our evolutionary trees that idea of instantaneous, multiple
branching seemed unlikely. At a fine-grained level, branches in our evolutionary
trees ought to split into two, driven by cell division and subsequent
diversification. This is also what our tree inference algorithms are designed
to find, even when our sequence data likely isn't sufficient to resolve all polytomies.
So how could these generalized coalescent trees possibly be consistent with our
empirical trees?
Instead of trying to resolve as many polytomies as possible, we decided to go in the other direction. We imagined reducing the resolution at which we could distinguish the order of branching events. Applying this `coarse-graining', we would certainly generate polytomies, as fast bursts of branching and multiple nodes collapse. Still, much like the EAD, there was no guarantee for what the distribution of polytomy sizes would be after this coarse-graining, or whether it would match these theoretical models. So our second surprise is that the distribution of burst sizes is also a power law---qualitatively consistent with the same distribution in the Lambda-coalescent.
Instead of trying to resolve as many polytomies as possible, we decided to go in the other direction. We imagined reducing the resolution at which we could distinguish the order of branching events. Applying this `coarse-graining', we would certainly generate polytomies, as fast bursts of branching and multiple nodes collapse. Still, much like the EAD, there was no guarantee for what the distribution of polytomy sizes would be after this coarse-graining, or whether it would match these theoretical models. So our second surprise is that the distribution of burst sizes is also a power law---qualitatively consistent with the same distribution in the Lambda-coalescent.
Outlook
So this seems to be the beginning of a very nice story,
with a lot of open questions. Empirical trees display bursts of
branching, which quickly collapse to polytomies under coarse-graining, and the
distribution of sizes of these bursts is a power law. The
Lambda-coalescent is likely not the end of the story, but at least
suggests that this distribution is tied together with the scaling behavior of
phylogenetic diversity.
What's next? Certainly lots of empirical questions.
Does this behavior extend over an even broader range of samples?
And will it still hold if we have better, longer sequence data?
There are also theoretical questions, mostly centering around whether we
can relate parsimonious but mechanistic models to the bursty tree structures,
and how best to evaluate and compare these models. One take-home message
stands out for me. Simplified models of biodiversity, like neutral models
and their generalizations, likely won't ever capture the fine-grained dynamical
behavior of an ecological community. But they might just tell us something
about coarse-grained dynamical behavior, and coarse-grained phylogenies could
be a nice part of this story. Let's see if coarse-grained patterns can be matched with coarse-grained process.
No comments:
Post a Comment