Worth a look (from arXiv): Robust estimation of microbial diversity in theory and in practice

I confess I do not have the time right now to delve into this in detail but this seems of interest: Robust estimation of microbial diversity in theory and in practice. From Bart Haegeman, Jerome Hamelin, John Moriarty, Peter Neal, Jonathan Dushoff and Joshua S. Weitz (full disclosure -I am friends and co-author with some of the authors here).

Abstract: Quantifying diversity is of central importance for the study of struc- ture, function and evolution of microbial communities. The estima- tion of microbial diversity has received renewed attention with the advent of large-scale metagenomic studies. Here, we consider what the diversity observed in a sample tells us about the diversity of the community being sampled. First, we argue that one cannot reliably es- timate the absolute and relative number of microbial species present in a community without making unsupported assumptions about species abundance distributions. The reason for this is that sample data do not contain information about the number of rare species in the tail of species abundance distributions. We illustrate the difficulty in compar- ing species richness estimates by applying Chao’s estimator of species richness to a set of in silico communities: they are ranked incorrectly in the presence of large numbers of rare species. Next, we extend our analysis to a general family of diversity metrics (“Hill diversities”), and construct lower and upper estimates of diversity values consistent with the sample data. The theory generalizes Chao’s estimator, which we retrieve as the lower estimate of species richness. We show that Shannon and Simpson diversity can be robustly estimated for the in silico communities. We analyze nine metagenomic data sets from a wide range of environments, and show that our findings are relevant for empirically-sampled communities. Hence, we recommend the use of Shannon and Simpson diversity rather than species richness in efforts to quantify and compare microbial diversity.

  1. Jonathan,

    Thanks for the post. I wanted to follow-up with a short summary of what we did, focusing on the biological ramifications of our study. The analysis can be found on the arXiv or at ISME J.

    The inspiration for our manuscript lay, in part, in a recent claim within Mora et al's interesting study of diversity on Earth in which they stated that that there are at least 10,100 number of prokaryotic species. As you noted in your blog, this statement may be true, but it's not particularly helpful. The first point of our work is to demonstrate rigorously why such statements are both true and unhelpful. The reason is that a lower estimate of species richness may have no correspondence to actual values since the distribution of individuals observed in a sample is insensitive to whether or not there are many rare species in the community.

    Moreover, we then show that one cannot even compare such statements about species richness. For example, imagine two studies of microbial species richness led by Alice and Bob - in which estimated species richness is inferred from a relatively small sample of the community. Alice estimates that there are least 5K microbial species in environment A and Bob estimates that there are at least 10K microbial species in environment B. As we have shown, the true value of species may be much bigger in A than in B! Hence the rank-order of lower estimates of species richness need not correspond to the true order of species richness.

    Finally, we then show that Shannon/Simpson diversity of those environments can be estimated from the sample and that the estimates are robust, i.e., can be compared. We did this by constructing lower AND upper bounds for these diversity metrics (and for all Hill diversities) and showing the range between the bounds for these two metrics is small. The reason why the range is small is that Shannon/Simpson don't depend sensitively on the abundance of very rare species, due to the weighting of more abundant species in their notions of diversity.

    We welcome feedback, and indeed just received interesting feedback from Anne Chao and Lou Jost pointing out efforts along similar lines. I certainly think this message needs to be heard by researchers studying microbial ecology to avoid both (i) true but uninformative lower estimates presented alone; and (ii) the danger presumptive in comparing lower estimates in the absence of upper estimates.

    -Joshua Weitz


