Important paper on annotation standards for bacterial/archaeal genomes - readying for the "data deluge"

Interesting paper in the journal "Standards in Genomic Sciences" that is worth checking out for anyone interested in genome sequencing and annotation. The paper is "Solving the Problem: Genome Annotation Standards before the Data Deluge" by William (aka Bill) Klimke et al.

It discusses the development of international annotation standards at NCBI (The National Center for Biotechnology Information) in collaboration with others. Note - the paper is Open Access.

Their abstract:
The promise of genome sequencing was that the vast undiscovered country would be mapped out by comparison of the multitude of sequences available and would aid researchers in deciphering the role of each gene in every organism. Researchers recognize that there is a need for high quality data. However, different annotation procedures, numerous databases, and a diminishing percentage of experimentally determined gene functions have resulted in a spectrum of annotation quality. NCBI in collaboration with sequencing centers, archival databases, and researchers, has developed the first international annotation standards, a fundamental step in ensuring that high quality complete prokaryotic genomes are available as gold standard references. Highlights include the development of annotation assessment tools, community acceptance of protein naming standards, comparison of annotation resources to provide consistent annotation, and improved tracking of the evidence used to generate a particular annotation. The development of a set of minimal standards, including the requirement for annotated complete prokaryotic genomes to contain a full set of ribosomal RNAs, transfer RNAs, and proteins encoding core conserved functions, is an historic milestone. The use of these standards in existing genomes and future submissions will increase the quality of databases, enabling researchers to make accurate biological discoveries.
The paper refers extensively to workshops held by NCBI on genome annotation and gives a link to a page from NCBI with additional information about these workshops.

Now - never mind the extensive use of the term prokaryote in the paper ... the paper has got a wealth of information and tidbits worth checking out.

For example the paper has a nice table on annotation tools and databases and resources.

Among the other sections worth checking out
* Discussion of pseudogene annotation and identification
* Discussion of variation in structural annotation
* Evidence standards
* Functional annotation and naming guidelines

For anyone interested in annotating a genome - and more and more people are these days with the decrease in sequencing costs - this is a must read.

