The distributed genome hypothesis states that the gene pool of a bacterial taxon is much more complex than that found in a single individual genome. However, the possible fitness advantage, why such genomic diversity is maintained, whether this variation is largely adaptive or neutral, and why these distinct individuals can coexist, remains poorly understood. Here, we present the infinitely many genes (IMG) model, which is a quantitative, evolutionary model for the distributed genome. It is based on a genealogy of individual genomes and the possibility of gene gain (from an unbounded reservoir of novel genes, e.g., by horizontal gene transfer from distant taxa) and gene loss, for example, by pseudogenization and deletion of genes, during reproduction. By implementing these mechanisms, the IMG model differs from existing concepts for the distributed genome, which cannot differentiate between neutral evolution and adaptation as drivers of the observed genomic diversity. Using the IMG model, we tested whether the distributed genome of 22 full genomes of picocyanobacteria (Prochlorococcus and Synechococcus) shows signs of adaptation or neutrality. We calculated the effective population size of Prochlorococcus at 1.01 × 1011 and predicted 18 distinct clades for this population, only six of which have been isolated and cultured thus far. We predicted that the Prochlorococcus pangenome contains 57,792 genes and found that the evolution of the distributed genome of Prochlorococcus was possibly neutral, whereas that of Synechococcus and the combined sample shows a clear deviation from neutrality.
Wish they had gone beyond these two cyanobacteria ... but still seems of possible interest. Baumdicker, F., Hess, W., & Pfaffelhuber, P. (2012). The Infinitely Many Genes Model for the Distributed Genome of Bacteria Genome Biology and Evolution, 4 (4), 443-456 DOI: 10.1093/gbe/evs016