Abstract: Massively parallel high throughput sequencing technologies allow us to interrogate the microbial composition of biological samples at unprecedented resolution. The typical approach is to perform high-throughout sequencing of 16S rRNA genes, which are then taxonomically classified based on similarity to known sequences in existing databases. Current technologies cause a predicament though, because although they enable deep coverage of samples, they are limited in the length of sequence they can produce. As a result, high-throughout studies of microbial communities often do not sequence the entire 16S rRNA gene. The challenge is to obtain reliable representation of bacterial communities through taxonomic classification of short 16S rRNA gene sequences. In this study we explored properties of different study designs and developed specific recommendations for effective use of short-read sequencing technologies for the purpose of interrogating bacterial communities, with a focus on classification using naïve Bayesian classifiers. To assess precision and coverage of each design, we used a collection of ~8,500 manually curated 16S rRNA gene sequences from cultured bacteria and a set of over one million bacterial 16S rRNA gene sequences retrieved from environmental samples, respectively. We also tested different configurations of taxonomic classification approaches using short read sequencing data, and provide recommendations for optimal choice of the relevant parameters. We conclude that with a judicious selection of the sequenced region and the corresponding choice of a suitable training set for taxonomic classification, it is possible to explore bacterial communities at great depth using current technologies, with only a minimal loss of taxonomic resolution.
Not sure I like everything in the paper. For example, they focus on naive Bayesian classification methods ... when (of course) I prefer phylogenetic methods. But that is a small issue. Overall there is a lot of useful detail in here about rRNA based taxonomic studies. I note - some of this probably applies to metagenomic studies as well ... perhaps this group will do a comparable analysis of metagenomics next?
Mizrahi-Man O, Davenport ER, Gilad Y (2013) Taxonomic Classification of Bacterial 16S rRNA Genes Using Short Sequencing Reads: Evaluation of Effective Study Designs. PLoS ONE 8(1): e53608. doi:10.1371/journal.pone.0053608
I note - if you want to catch up / learn / research metagenomics and phylogeny or classification check out the Mendeley group I started on the topic: