Wednesday, May 01, 2013

The need for a phylogeny driven genomic encyclopedia of eukaryotes

Monday I gave a talk for the SMBE Eukaryotic Omics satellite meeting that has been going on at UC Davis.  When Holly Bik, a post doc in my lab asked me to talk at the meeting, I said, basically "Well, OK, but I don't really do much work on eukaryotes."  And then I came up with an idea - I could make my talk about how it might be good to have a better phylogenetic sampling of eukaryotic genome sequences.  I have been a bit obsessed for many many years about phylogenetic sampling of genomes and, well, though I have avoided eukaryotes mostly in most of my genome sequencing work, I figured, I should still get on my soap box about how phylogenetic sampling is a good thing.  So, well, I did.  And I think we (i.e., the scientific community) really needs a better sampling of eukaryotic genomes.

I have posted my talk to Slideshare and I recorded audio of my talk in synch with the slides and posted that to Youtube.  These are below.

I hereby am calling for those people interested in participating in such a phylogeny driven genomic encyclopedia of eukaryotes to make yourselves known.  We NEED to do this.

Related posts


  1. Thought du jour... could something like this be folded in to something like EOL? Probably the other way around, build the genomic encyclopedia and link EOL into it. I'd love to have an integrated view of everything from the telomeres to the tail feathers.

    1. EOL and genome projects like this should definitely be connected ... but not sure EOL is the right umbrella to actually fold this under

  2. I couldn't make #SMBEeuks (although Twitter coverage was pretty good!) but I'm very interested in this, and I'd love to be involved. My interests are evolutionary comparative genomics rather than any specific taxonomic group though. I even think you understate the importance above, why is phylogeny-based sampling of life not THE top priority of the genomics community?

  3. My opinion is that it will be important to finish (good annotation and all) a few complete genomes first, which will then make it easier to work on new genomes. With today's sequencing technology, the annotation is a weak link in the chain, and under-funded. We have "complete genomes" of dozens of modern humans, one Neanderthal, one chimpanzee, one gorilla and a few other primates for example. But asking questions about specific differences in those genomes is not yet possible.

    Even with bacteria, complete genomes are annotated to different standards for different species. I often find my favorite genes (the fusA gene encoding the elongation factor G ribosomal translocase protein, and lepA gene which encodes a related GTP-binding protein, the ribosomal back-translocase) annotated differently in different species. In close relatives of Escherichia the fusA gene is usually correctly annotated, LepA not so much.

    I guess my point would be that "more data" is not always helpful in an era of data overload. We need some focus on data curation as well. So far, GenBank and similar databases are not following the wikipedia model of crowd-sourcing the data curation. And it is not clear that such crowd-sourcing will work in this field anyway, because the crowd of biologists who are educated enough to do this work are too busy writing 100 grant proposals to get 1 grant funded...

    Given the current level of genomics, it may be more useful to sequence 50 to 100 well characterized genes plus the complete mitochondrial genome, from each of 50 to 1,000 phylogenetically sampled eukaryotes, rather than aiming for the complete genomes. If it is currently impossible to compare human chromosome 1 to chimpanzee chromosome 1, I don't see a much hope for comparing mammals to yeast in the near future. However we can already compare any given gene, such as DNA polymerase III, or the genes for the ribosomal translocases, across a wide variety of organisms.

    1. The main benefits I propose(d) to having a better phylogenetic sampling of genomes are mostly not related to generating a better phylogeny (although that is one use). And sampling 50 well characterized genes across taxa will not provide these benefits. As for completeness vs. not I am not so sure you are right. We found in our bacterial and archaea genomic encyclopedia that completeness was not that big of a bonus. Certainly good annotation is important but that does not necessarily come from having a complete genome. Anyway - the fact is people are sequencing 1000s of genomes. And I think it would be silly to not include in those 1000s and 1000s some genomes from across the tree of life ...


Most recent post

Another day to think, to pause, to ponder.

Panorama of Sycamore Park and the memorial to Karim   A bit over 10 years ago I wrote a blog post that I repost all the time. Entitled "...