The Tree of Life: Some things to read in light of reported human DNA in bacterial genomes vs. contamination

Friday, February 18, 2011

Some things to read in light of reported human DNA in bacterial genomes vs. contamination

Well, there is an interesting few papers out there relating to human DNA and whether or not there have been some recent lateral transfers of it into microbial genomes. See for example

this paper in mBio that suggests there has been lateral transfer of LINE elements from humans to Neisseria species
but then see this paper suggesting massive contamination of sequence databases with LINE elements (PLoS One paper on contamination)

So what is going on? Not clear. If you want more detail about these papers I suggest reading one of the following

Mark Pallen: Human DNA in bacterial genomes? Yes? No? Maybe?
Ed Yong: Gonorrhea has picked up human DNA (and that’s just the beginning)
Hannah Waters: Contaminated genomes
Hannah Waters: Transitioning into “real” science journalism

There were other stories out there ... but since Hannah and Ed interviewed me, I am a bit biased about which ones are worth reading. Here are some others to read though

Personally, I am a bit skeptical of the LGT claim because most of the evidence they present relies on amplification (ie PCR). But without getting into too many of the details myself I thought I would just post some background reading connected to some of my past work in this area for anyone interested in this type of thing

Information about claim of HGT into humans from bacteria that was in the Lander et al Human Genome paper:

One use of finishing genomes: helping rule out contamination

You get what you pay for

A short story I wrote in 1998 about, well, contamination in genome databases

HYG101

My colleagues assembling of nearly complete bacterial genomes from the raw sequence reads from fly genome projects

Three Bacterial Genomes Found Lurking Inside Recently Sequenced ...

Complete mitochondrial genome(s) found in Chromosome II of Arabidopsis. Was very difficult to sort out which reads came from nuclear genome and which from mitochondria

Lin et al 1999. Arabidopsis chromosome II. Nature.

1 comment:

Paul Gardner2/18/2011 3:02 PM
Contamination is the bane of my existence. It's constantly making me look silly for the Rfam work. Frequently a perfectly good bacterial family contains high-scoring eukaryotic sequence. RNAI is one of the worst affected families. It's a beautiful phage encoded RNA that represses phage replication. Unfortunately the phage is used for a lot of sequencing projects and is often not cleaned up from the submissions to the sequence archives. So I'm left with a phage model that annotates a handful of phage homologs and THOUSANDS of contaminant sequences (see the 'species' tab). I've been considering using families like this to identify the authors who have submitted the most contaminated sequence to the sequence archives. I'm just not too sure if any benefit would come from this or not.
ReplyDelete
Replies