Friday, February 18, 2011

Some things to read in light of reported human DNA in bacterial genomes vs. contamination

Well, there is an interesting few papers out there relating to human DNA and whether or not there have been some recent lateral transfers of it into microbial genomes.  See for example
  • this paper in mBio that suggests there has been lateral transfer of LINE elements from humans to Neisseria species
  • but then see this paper suggesting massive contamination of sequence databases with LINE elements (PLoS One paper on contamination)
So what is going on?  Not clear.  If you want more detail about these papers I suggest reading one of the following
There were other stories out there ... but since Hannah and Ed interviewed me, I am a bit biased about which ones are worth reading.  Here are some others to read though
Personally, I am a bit skeptical of the LGT claim because most of the evidence they present relies on amplification (ie PCR).  But without getting into too many of the details myself I thought I would just post some background reading connected to some of my past work in this area for anyone interested in this type of thing

Information about claim of HGT into humans from bacteria that was in the Lander et al Human Genome paper:
A short story I wrote in 1998 about, well, contamination in genome databases
My colleagues assembling of nearly complete bacterial genomes from the raw sequence reads from fly genome projects
Complete mitochondrial genome(s) found in Chromosome II of Arabidopsis.  Was very difficult to sort out which reads came from nuclear genome and which from mitochondria

1 comment:

  1. Contamination is the bane of my existence. It's constantly making me look silly for the Rfam work. Frequently a perfectly good bacterial family contains high-scoring eukaryotic sequence. RNAI is one of the worst affected families. It's a beautiful phage encoded RNA that represses phage replication. Unfortunately the phage is used for a lot of sequencing projects and is often not cleaned up from the submissions to the sequence archives. So I'm left with a phage model that annotates a handful of phage homologs and THOUSANDS of contaminant sequences (see the 'species' tab). I've been considering using families like this to identify the authors who have submitted the most contaminated sequence to the sequence archives. I'm just not too sure if any benefit would come from this or not.


Irresponsible reporting on "poop doping" from the Washington Post

UPDATE - see below - the author updated her article including some of my critiques. Went on a bit of a Twitter tirade last night. See mor...