I am writing because I am working on a project to evaluate the importance of finishing microbial genomes. I know there has been lots of talk about this out there on the web and in papers, etc but I think a fresh discussion is useful. To get people up to speed below is a summary of the issue as I see it.
- Shotgun sequencing: Genome sequencing relies generally on the shotgun method at the beginning of a project where DNA fragments from an organism of interest are sequenced in a highly random manner.
- Assembly: After shotgun sequencing, the genome is assembled as best as possible into larger pieces (called contigs) and ordered sets of contigs (called scaffolds). All of this put together can be called an "assembly"
- Gaps: After the assembly phase, there are almost always gaps in the assembly. These generally come in two forms:
- sequencing gaps (where we know two contigs go together in some orientation but where we do not know the sequence of the DNA in between the contigs)
- physical gaps (where we have sets of scaffolds but do not know how the connect to each other).
We plan to try to measure what one gains by the finishing steps. We need to know this because we would like to make intelligent decisions about how to allocate resources. If one gains a lot from finishing then it would make sense to allocate significant resources to it. I note, I and some colleagues wrote a paper about this issue "The value of complete microbial genome sequencing (You get what you pay for)" that was published in 2002. This is without a doubt not the only discussion of the topic but I just wanted to point out I have been involved in this debate before. Despite that, I think we simply do not know right now what the benefits might be in the new sequencing landscape.
So the question I am asking here is:
What do people think are the potential benefits that could come from finishing?------------------------------------------
Here are some possible answers to get the discussion going:
- Gene discovery (e.g., there may be interesting/important genes in missing/low quality data)
- Esthetics of completeness (as in, it just feels better to have a finished genome)
- Improved analysis of genome organization (in particular from having contigs oriented correctly)
Also - I note there has been some discussion of this for animals, plants etc (e.g., see recent paper by Eric Green and others on vertebrates) Many of the issues are similar but they are different enough that I think a microbe focused discussion is useful.
Other links of interest:
Blakesley, R., Hansen, N., Gupta, J., McDowell, J., Maskeri, B., Barnabas, B., Brooks, S., Coleman, H., Haghighi, P., Ho, S., Schandler, K., Stantripop, S., Vogt, J., Thomas, P., Comparative Sequencing Program, N., Bouffard, G., & Green, E. (2010). Effort required to finish shotgun-generated genome sequences differs significantly among vertebrates BMC Genomics, 11 (1) DOI: 10.1186/1471-2164-11-21
Fraser, C., Eisen, J., Nelson, K., Paulsen, I., & Salzberg, S. (2002). The Value of Complete Microbial Genome Sequencing (You Get What You Pay For) Journal of Bacteriology, 184 (23), 6403-6405 DOI: 10.1128/JB.184.23.6403-6405.2002
Chain, P., & et al. (2009). Genome Project Standards in a New Era of Sequencing Science, 326 (5950), 236-237 DOI: 10.1126/science.1180614
Friendfeed discussion of this post: