Saturday, September 06, 2008

Tracing the evolutionary history of Sarah Palin: links to a parasitic nematode and the pathogenic fungus Botryotinia fuckeliana

You see, as a total sequence analysis dork, when I see names, I frequently ask whether the letters in the name include only letters which are used as amino acid abbreviations. I started this game when the brilliant notes/letters came out in Science in the early 90s about whether ELVIS was overrepresented in protein sequences. Of course, despite being 20 years old, Science still keeps these under wraps requiring registration to see them (see for example the Stevens letter).

Anyway, alas, three of the major candidates for the US election have names that do not use traditional amino acid abbreviations so I am stuck with analyzing Sarah Palin. But that is OK because of her professed aversion to evolution and support to Creationism (and since sequence analysis is inherently an evolutionary study).

So - I took here name and went to the NCBI Blast page and did some searches. And what came up? Well, here are some of the top hits from the blastp searches (which I used to compare the pretend peptide "SARAHPALIN" with all the peptides in the non redundant collection at Genbank).

>ref|XP_001545292.1| Gene info hypothetical protein BC1G_16161 [Botryotinia fuckeliana B05.10]
gb|EDN25226.1| Gene info predicted protein [Botryotinia fuckeliana B05.10]
Length=383

GENE ID: 5425746 BC1G_16161 | hypothetical protein
[Botryotinia fuckeliana B05.10]

Score = 26.9 bits (56), Expect = 189
Identities = 8/9 (88%), Positives = 8/9 (88%), Gaps = 0/9 (0%)

Query 1 SARAHPALI 9
SARA PALI
Sbjct 209 SARAQPALI 217


>ref|YP_061725.1| Gene info homoserine dehydrogenase [Leifsonia xyli subsp. xyli str. CTCB07]
gb|AAT88620.1| Gene info homoserine dehydrogenase [Leifsonia xyli subsp. xyli str. CTCB07]
Length=451

GENE ID: 2939000 thrA | homoserine dehydrogenase
[Leifsonia xyli subsp. xyli str. CTCB07] (10 or fewer PubMed links)

Score = 26.9 bits (56), Expect = 189
Identities = 8/9 (88%), Positives = 8/9 (88%), Gaps = 0/9 (0%)

Query 1 SARAHPALI 9
SAR HPALI
Sbjct 267 SARVHPALI 275

>ref|ZP_02031476.1| hypothetical protein PARMER_01474 [Parabacteroides merdae ATCC
43184]
gb|EDN87136.1| hypothetical protein PARMER_01474 [Parabacteroides merdae ATCC
43184]
Length=299

Score = 26.1 bits (54), Expect = 340
Identities = 7/8 (87%), Positives = 8/8 (100%), Gaps = 0/8 (0%)

Query 3 RAHPALIN 10
RAHPAL+N

Sbjct 170 RAHPALVN 177

>ref|XP_567332.1| Gene info hypothetical protein CNJ01520 [Cryptococcus neoformans var. neoformans
JEC21]
ref|XP_773201.1| Gene info hypothetical protein CNBJ1950 [Cryptococcus neoformans var. neoformans
B-3501A]
gb|EAL18554.1| Gene info hypothetical protein CNBJ1950 [Cryptococcus neoformans var. neoformans
B-3501A]
gb|AAW45815.1| Gene info hypothetical protein CNJ01520 [Cryptococcus neoformans var. neoformans
JEC21]
Length=437

GENE ID: 3254188 CNJ01520 | hypothetical protein
[Cryptococcus neoformans var. neoformans JEC21] (10 or fewer PubMed links)

Score = 26.1 bits (54), Expect = 340
Identities = 8/9 (88%), Positives = 8/9 (88%), Gaps = 0/9 (0%)

Query 1 SARAHPALI 9
SAR HPALI
Sbjct 415 SARQHPALI 423


>ref|YP_001626035.1| Gene info citrate synthase [Renibacterium salmoninarum ATCC 33209]
gb|ABY24621.1| Gene info citrate synthase [Renibacterium salmoninarum ATCC 33209]
Length=386

GENE ID: 5822379 RSal33209_2898 | citrate synthase
[Renibacterium salmoninarum ATCC 33209]

Score = 25.7 bits (53), Expect = 456
Identities = 9/11 (81%), Positives = 9/11 (81%), Gaps = 2/11 (18%)

Query 1 SARAHP--ALI 9
SARAHP ALI
Sbjct 218 SARAHPYAALI 228


>ref|YP_001817256.1| Gene info integral membrane sensor hybrid histidine kinase [Opitutus terrae
PB90-1]
gb|ACB73656.1| Gene info integral membrane sensor hybrid histidine kinase [Opitutus terrae
PB90-1]
Length=936

GENE ID: 6208547 Oter_0366 | integral membrane sensor hybrid histidine kinase
[Opitutus terrae PB90-1]

Score = 25.2 bits (52), Expect = 611
Identities = 7/7 (100%), Positives = 7/7 (100%), Gaps = 0/7 (0%)

Query 3 RAHPALI 9
RAHPALI
Sbjct 256 RAHPALI 262


>ref|YP_001757871.1| Gene info putative anti-sigma regulatory factor, serine/threonine protein
kinase [Methylobacterium radiotolerans JCM 2831]
gb|ACB27188.1| Gene info putative anti-sigma regulatory factor, serine/threonine protein
kinase [Methylobacterium radiotolerans JCM 2831]
Length=331

GENE ID: 6141303 Mrad2831_5232 | putative anti-sigma regulatory factor,
serine/threonine protein kinase [Methylobacterium radiotolerans JCM 2831]

Score = 25.2 bits (52), Expect = 611
Identities = 7/8 (87%), Positives = 8/8 (100%), Gaps = 0/8 (0%)

Query 2 ARAHPALI 9
ARAHPAL+
Sbjct 299 ARAHPALV 306

>ref|ZP_01466013.1| hydrolase, TatD family [Stigmatella aurantiaca DW4/3-1]
gb|EAU63211.1| hydrolase, TatD family [Stigmatella aurantiaca DW4/3-1]
Length=209

Score = 25.2 bits (52), Expect = 611
Identities = 7/7 (100%), Positives = 7/7 (100%), Gaps = 0/7 (0%)

Query 3 RAHPALI 9
RAHPALI
Sbjct 79 RAHPALI 85


>ref|YP_001558323.1| Gene info glycosyl transferase group 1 [Clostridium phytofermentans ISDg]
gb|ABX41584.1| Gene info glycosyl transferase group 1 [Clostridium phytofermentans ISDg]
Length=357

GENE ID: 5743305 Cphy_1206 | glycosyl transferase group 1
[Clostridium phytofermentans ISDg]

Score = 25.2 bits (52), Expect = 611
Identities = 8/10 (80%), Positives = 8/10 (80%), Gaps = 0/10 (0%)

Query 1 SARAHPALIN 10
S RAHP LIN

Sbjct 113 SERAHPLLIN 122



There does not appear to be a perfect match in the NCBI NR protein database. But take a close look at the #1 scoring hit. That is right, it is from and organism called Botryotinia fuckeliana. No comment on the appropriateness of this name, but it does contain a term I will probably use a lot if she gets elected.

Of course, anybody who has heard me blather on and on about evolution knows that I am always talking about how blast top hits are not a good measure of relatedness per se (see my NAR paper where I first talked about this in 1995). So - I decided to build a tree of Sarah Palin. I used the NCBI Distance Tree option which you can do from blast searches.










Since most likely you cannot see that in enough detail - here is a zoom in.








That one did not come through on the Blog so well either so I decided to output the tree in Newick format and then I searched for a program that could draw a better figure on the web (we have tools in my lab to do this but I am trying to do this all on the web as an exercise). And I found a web site that makes drawtree available. And I plugged in the Newick format and it made a nicer one.




Though making trees from really short sequences is not ideal, in this tree, Sarah Palin is shown to be at the root of a branch including a protein from the parasitic nematode Brugia malayi. So if we take an evolutionary interpretation it seems that this causative agent of filariasis (well, a protein from this agent) is descended from SarahPalin. In other words, she seems to be ancestral to this parasite.

So in conclusion - by similarity - SarahPalin is closest to a plant pathogen with an unusual name. And by phylogeny SarahPalin is ancestral to a parasitic nematode. Sounds about right.

22 comments:

  1. That is fantastic! Widely forwarded already...

    ReplyDelete
  2. Wait wait wait!

    In Parabacterioides merdae, the species name 'merdae' came from the french word 'merde', which is… crap (basically, the place in which you find this bacteria…)

    Sequence alignment is so powerful!

    ReplyDelete
  3. You just cannot make this stuff up

    ReplyDelete
  4. Clearly we can not vote into office someone who may have the power to regulate chromosomal condensation. Especially when she is already under investigation for abuse of power.

    ReplyDelete
  5. I can't believe I missed that. She IS condescending.

    ReplyDelete
  6. Iddo Friedberg9/06/2008 4:29 PM

    When I was a PhD student, my advisor was to speak of her work at a meeting of Israeli and Palestinian scientists & students. As the audience was very broad, the talks were to be very non-specific. So what's a bioinformatics lab to do? We decided it would be fun to look for PEACE in SwissProt. The top hit was... wait for it... the receptor for sperm coat protein in the human zona pellucida. The zona pellucida is the outer membrane of the oocyte.

    Make love, not war!

    ReplyDelete
  7. Unbelievable. Sarah Palin is closely related to B. fukeliana? Too funny to be made up.

    Did you search her kids' names to see if they are related to her? Maybe they aren't...

    ReplyDelete
  8. Priceless!!! I think this should become standard practice for anyone standing for office. In fact, lets propose that anyone with non-standard names come up with an appropriate spelling of their choice so that such an analysis can be performed.

    ReplyDelete
  9. I have to admit I sat in my chair and sniggered for half and hour after reading this. I then linked anyone I knew who was online at the time...no one else got it.

    ReplyDelete
  10. Any palindromic sequences?

    ReplyDelete
  11. I think we may have to redefine "palindromic" as "sequences that are indicative of very ancestral states" or something like that

    ReplyDelete
  12. I may write a new post on this --- but the best blast hit for "MCCAINPALIN" is ....

    >gb|ACE82429.1| polyprotein [Hepatitis C virus subtype 1a]
    Length=3011

    Score = 26.1 bits (54), Expect = 339
    Identities = 8/10 (80%), Positives = 9/10 (90%), Gaps = 1/10 (10%)

    _____Query 1 MCCAINPALI 10
    ____________MC AI+PALI
    ___Sbjct 877 MC-AIHPALI 885

    ReplyDelete
  13. Botryotinia fuckeliana (Botrytis cinerea) is a spoilage agent for wine grapes and also purposely encouraged to generate the Noble rot in the production of grapes for Sauterne style wines. Given her political orientation I can't figure out if this revelation means she is inherently for, or against, alcohol consumption. Maybe she's a pit bull with lipstick who likes dessert wines.

    ReplyDelete
  14. Thankfully, someone changed the name to fuckeliana. Or my blog would not have gotten as much attention. I think given Palin's image, she is more likely to be a beer drinker than a wine drinker (nothing against her personally at all -- I am more on the beer than wine side of things too). And despite really not liking her politics, I cannot help liking some aspects of Palin's persona. Maybe her microbial connection is what does it, given that I still generally refuse to study "macrobes" and don;t like a lot of them ;-)

    ReplyDelete
  15. Awesome, just awesome! Thanks!

    ReplyDelete
  16. *sigh* I can't wait to see the McCain attack ad about liberal anti-palin bias in evolutionists that comes from this post.

    ReplyDelete
  17. Trying to just report the facts here Shawna (with a little commentary here and there). If I could have searched using Obama I would have but he has letters in his full name that I could not use.

    ReplyDelete
  18. Good stuff, I agree, and quite funny. I do have to point out though that Sarah Palin, being an extant organism, is not "ancestral" to Brugia malayi. Instead, Sarah Palin and Brugia malayi descended from a common ancestor. I only point this out because we should get our evolutionary biology correct, and because it is perhaps even funnier. I wonder what the common ancestor looked like?

    ReplyDelete
  19. Finally Mark. Finally someone saw that. I put that there on purpose hoping that some other bloggers would call me to task on it. But alas everyone kept focusing on "fuckeliana". Imagine that. Yes, indeed, as a modern organism SARAHPALIN cannot be ancestral to other modern organisms.

    ReplyDelete
  20. I would hardly call her a modern organism.

    ReplyDelete
  21. Karen - how about "An organism living in modern times" or "An organism living in modern times that has not evolved much since it shared a common ancestor with parasitic nematodes"

    ReplyDelete