Wednesday, February 11, 2009

Charles Darwin relic hidden in the chimp and human genomes

So - in honor of Charles Darwin and as a follow up to my analysis of Sarah Palin's name (which amazingly showed as a best hit a fungus called B. fuckeliana) I decided today to do some blast searches with old Charlie D.'s name. You see CHARLES DARWIN includes letters that all are abbreviations of amino acids that make up proteins, so you can compare his name, pretending it is a protein, to proteins from other organisms.

So I went to the NCBI blast page and did a BLASTP search. Blastp searches a peptide against a database of peptides and identifies in the database sequences if one or more have similar amino-acid sequences to the one used to search (which is known as the query) . To make this work, I had to adjust some of the default parameters to make it possible to better detect short matches (I raised the # of expected matches to 10000).

Alas, no good matches convincing matches to known or predicted proteins came up. So I was sad. Then I said, what if Darwin was hiden in the genome of some organism? So I did a "translational" blast search called tblastn which takes a peptide and searches it against a DNA database and translates the DNA into all possible peptides it could encode. When one does this, one can possibly find "hidden" proteins or relics of proteins in the DNA that may not have been labelled as proteins by whomever analzyed the DNA data.

And what did I find by this Tblastn search? A jackpot to make evolutionary biologists VERY happy. The best matches for CHARLESDARWIN the peptide? Pan troglodytes. AKA Chimps. And humans (the matches were equally strong).

So - hidden in the human and Chimp genomes is a relic of one Charles Darwin. Happy Birthday Charlie.

----------------------------------------------
See search results below:

Score E
Sequences producing significant alignments: (Bits) Value


gb|AC199643.3| Pan troglodytes BAC clone CH251-444E8 from chr... 25.8 1930
gb|AC093749.3| Homo sapiens BAC clone RP11-30B7 from 4, compl... 25.8 1930
gb|AF250324.1|AF250324 Homo sapiens chromosome 4q35 BAC clone... 25.8 1930
gb|AC217674.3| Pan troglodytes BAC clone CH251-398H5 from chr... 25.0 3549
gb|AC195095.2| Pan troglodytes BAC clone CH251-577A14 from ch... 25.0 3549
gb|AC188794.3| Pan troglodytes BAC clone CH251-69H24 from chr... 25.0 3549
gb|AC183104.3| Pan troglodytes BAC clone CH251-567E15 from ch... 25.0 3549
gb|AF105153.3| Homo sapiens alpha-satellite centromere border... 25.0 3549
emb|AL353763.14| Human DNA sequence from clone RP11-87H9 on c... 25.0 3549
gb|AC116618.4| Homo sapiens BAC clone RP11-98L17 from 4, comp... 25.0 3549
emb|CR786580.6| Human DNA sequence from clone RP11-764K9 on c... 25.0 3549
emb|AL591385.7| Human DNA sequence from clone RP11-391M20 on ... 25.0 3549
emb|AL445925.19| Human DNA sequence from clone RP11-403A15 on... 25.0 3549
emb|AL592183.10| Human DNA sequence from clone RP11-297D8 on ... 25.0 3549
ref|XM_787798.2| PREDICTED: Strongylocentrotus purpuratus sim... 24.3 6861
ref|XM_001201471.1| PREDICTED: Strongylocentrotus purpuratus ... 24.3 6861
gb|AC195625.1| Pan troglodytes BAC clone CH251-895L14 from ch... 23.9 7711
gb|AC175749.2| Pan troglodytes BAC clone CH251-1124N9 from ch... 23.9 7711



Download subject sequence spanning the                                    HSP Pan troglodytes BAC clone CH251-444E8 from chromosome 7, complete sequence Length=155150
Score = 25.8 bits (55), Expect = 1930, Method: Composition-based stats. Identities = 8/13 (61%), Positives = 11/13 (84%), Gaps = 0/13 (0%) Frame = -2

Query 1 ____ CHARLESDARWIN 13
_____________CH RLE D+++IN
Sbjct 145762 CHVRLEQDSKYIN 145724


gb|AC093749.3| Download subject sequence spanning the                                    HSP Homo sapiens BAC clone RP11-30B7 from 4, complete sequence Length=163102 Score = 25.8 bits (55), Expect = 1930, Method: Composition-based stats.
Identities = 8/13 (61%), Positives = 11/13 (84%), Gaps = 0/13 (0%) Frame = -3

Query 1 ___ CHARLESDARWIN 13
____________CH RLE D+++IN
Sbjct 31925 CHVRLEQDSKYIN 31887

2 comments:

  1. Now the only question is, when did the intelligent designer put that there for us to find?

    ReplyDelete
  2. I thought for sure you had made all this up, but I checked and you are completely right!

    ReplyDelete

Most recent post

Talk on Sequencing and Microbes ...

I recently gave a talk where I combined what are normally two distinct topics - the Evolution of DNA Sequencing, and the use of Sequencing t...