Monday, August 09, 2010

Lack of neutrality in bacteria and where pseudogenes go when they die

Pseudogenes, which are in essence regions of the genome that used to be genes but no longer able to produce a functional unit, have long been considered to be models of the genetic equivalent of Switzerland's neutrality. With this assumption of neutrality in hand, researchers have used studies of pseudogenes to better understand what happens to DNA when it is not visible to any form of natural selection. That is, pseudogenes have been thought to be neither harmful (as in, they are not under negative selection) or helpful (i.e., they are not under positive selection).

And from this assumption we have supposedly learned about mutation rates and patterns (because if they are neutral then the changes in pseudogenes should be reflective of mutational processes, not selection) as well as all sorts of other features of genome evolution.

Over the years, some have challenged the assumption of neutrality of pseudogenes (e.g., see here) like many have questioned whether Switzerland is really neutral. But overall, the feeling that pseudogenes were mostly neutral seems to have stuck. However, that may change a bit with a new paper from Chih-Horng Chu and Howard Ochman in PLoS Genetics (PLoS Genetics: The Extinction Dynamics of Bacterial Pseudogenes).

In their paper they report: (this is their authors summary)
Pseudogenes have traditionally been viewed as evolving in a strictly neutral manner. In bacteria, however, pseudogenes are deleted rapidly from genomes, suggesting that their presence is somehow deleterious. The distribution of pseudogenes among sequenced strains of Salmonella indicates that removal of many of these apparently functionless regions is attributable to their deleterious effects in cell fitness, suggesting that a sizeable fraction of pseudogenes are under selection.
Basically, what they did was the following

1. Compare Salmonella genomes. Identify putative pseudogenes and trace their evolution onto a phylogeny of the species.

Figure 1. Distribution of pseudogenes among Salmonellagenomes.
The phylogenetic tree was inferred from 2,898 single-copy genes shared by all fiveS. enterica subsp. enterica strains and the outgroup S. enterica subsp. arizonae.

2. Carry out a variety of analyses of the pseudogenes such as
  • looking at ratios of Ka/Ks (this is in essence a ratio of amino acid changes - aka non synonymous substitutions to "silent" synonymous changes which occur when the DNA sequence changes but the same amino acid is encoded).
  • examining the types and frequencies of gene inactivating mutations
3. Then they looked at the "ages" of pseudogenes - with age being estimated by the position in the tree in which the pseudogenes appear to have arise.

4. Finally the examined the age class distribution of pseudogenes as well as whether there were other differences between pseudogenes of different ages. And what they found was inconsistent with a neutral model. Instead, what they conclude is that something is making it advantageous to delete pseudogenes more rapidly than one might expect.

What explains this? After testing multiple possibilities the authors conclude that their is some negative selection against pseudogenes (or I guess positive selection for deletion of pseudogenes).

They conclude by suggesting this is likely to be pervasive across all bacteria and even in archaea. And furthermore make a connection to possible selection on intron size in eukaryotes. Anyway - the paper seems quite interesting and worth a read. Still pondering what it all means, so I would welcome comments.

Kuo, C., & Ochman, H. (2010). The Extinction Dynamics of Bacterial Pseudogenes PLoS Genetics, 6 (8) DOI: 10.1371/journal.pgen.1001050


  1. I'm not exactly sure what to make of it, either, but I'm reminded of a paper by Organ & al. (2007).

    Basically, avians have smaller genomes than those of other extant amniotes. It's been assumed that this is due to weight optimization for flight. But, by analyzing the bone cell size of various stem-avians, they found that small genome size probably evolved well before flight.

    I assume the avian genome is small due to deletion of pseudogenes, but I don't know.

  2. Thanks for highlighting this paper. Until I understand how they identified their pseudogenes, I won't understand how they know that pseudogenes are rapidly deleted and not just rapidly eliminated by selection against the organisms carrying them. Maybe I'll have to clarify my thinking by doing a blog post myself - if I do I'll link to yours.

  3. I read the Methods and looked at reference 19, and I think I understand how they identified their pseudogenes. Basically, they aligned genes present in all strains, and, in pairwise alignments, looked for places where a gene was not identified in one or more strains because it no longer coded for a full length protein.

    Here's my confusion:

    I agree that the scarcity of 'old' pseudogenes (ones that have accumulated multiple inactivating mutations) means that carrying pseudogenes must be costly. But how do they distinguish between the following explanations?

    1. We see few old pseudogenes because individuals with new mutations that delete a pseudogene are favoured over individuals (in the same lineage) that retain the pseudogene.


    2. We see few old pseudogenes because individuals carrying a pseudogene are outcompeted by individuals (in other lineages) who retain a functional version of this gene.

  4. So your number 2 is sort of interspecific or at least interstrain competition ? I think there are a whole slew of ways on could end up with pseudogenes being costly. I am going to have to pnder more and get back to you

  5. I would think this is a consequence of selection for smaller genome size amongst prokaryotes, no? This paper for example, doi: 10.1098/rspb.1999.0872, demonstrates that are substantial fitness costs are imposed on E. coli when they are forced to transcribe non-essential plasmid genes, and that experimental selection acts to remove those sequences; I imagine it would be much the same case for pseudogenes. I.e., transcription is energetically costly, so, prokaryotes have to be "lean and mean" to compete. In Nick Lane's book, Power, Sex, Suicide he speculates that the selection against pseudogenes in we eukaryotes is weaker because of our innovation of mitochondria -- they increase the respiratory efficiency of the individual cell, engendering multicellularity, and allowing us to grow large and explore new life-history strategies, by decoupling the previous tight linkage between genome size and fitness.

  6. Interesting paper, but I think it might be a little premature to state that this may apply to all Bacteria and Archaea. Couldn't this trend be due to some selection pressure on the Salmonella lineage to be reducing in size in general and that pseudogenes are just the easiest targets for this reduction? Considering that the outgroup species has more genes than all other species used in this study could suggest that there is global negative selection on genome size and that pseudogenes are just part of this.

  7. Morgan - the argument they make is basically captured in their conclusion "Because all bacterial groups, as well as those Archaea examined, display a mutational pattern that is biased towards deletions [18], [19], [33] and their haploid genomes would be more susceptible to dominant-negative effects that pseudogenes might impart, it is likely that the process of adaptive removal of pseudogenes is pervasive among prokaryotes"

  8. I think there are two slightly different, but related issues here. At least in how I have always thought of it, the assumption of pseudogene neutrality is strictly about the substitution process going on in a pseudogene when presence/absence of the pseudogene is itself a neutral trait.

    Layered on top of that is possible selective forces acting on the presence of that pseudogene. Most eukaryotic parasites for instance appear to have strong selective pressures acting on genome size, so pseudogenes tend to be lost.

  9. Thats a good point Rosie, I think distinguishing between those two criteria should be possible, at least in principle. Of course it could also be a mixture between the two. I can see situation 2 arising in many cases depending on the gene in question.

  10. OK, I did write a blog post about my concern with this paper (at http://rrresearch. Here's the bottom line:

    Is the DNA of new pseudogenes quickly lost from genomes by deletion, creating strains that are more fit than those with the pseudogene (but probably not more fit than the ancestor with the functional gene)? This predicts that sequenced genomes should contain many sites where 'core' genes have been deleted.

    Alternatively, are cells containing new pseudogenes quickly lost from populations because the cells compete poorly with cells that retain the functional gene? This predicts that sequenced genomes will typically all contain the same core genes.

  11. This comment has been removed by the author.

  12. So what about all the proteomic work showing that lots of what people call 'pseudogenes' are translated? Whenever people talk about pseudogenes, they never bring this up.

  13. Which proteomic work Sam? Most of what I have seen wasn't pseudogene specific and was centered around transcription, not translated proteins. Lots of pseudogenes are going to be partially transcribed certainly, it just depends on how much mutation has accumulated, but they generally have lots of premature stop codons and are unlikely to be translated in meaningful amounts.

    This reminds me of the situation with the results from microarray tiling experiments and finding lots of transcription all over Eukaryotic genomes. RNA-Seq experiments have begun to show that most of that is extremely low level, which is to be expected. Biology is messy and imperfect, which applies to transcription and translation as much as it does to anything else.

  14. DG - Here's two pmids with proteomics (not transcriptomics) showing pseudogenes being translated. I'm an author on both, and would be very curious to hear what people have to say. From a research standpoint, I've not been able to get many good conversations in about this.

    Additionally, I've got another paper in review where I got 15 more proteomics data sets (for a variety of reasons). The relevant part for this is - most genomes where people annotate something as pseudogene, I find some of those translated. I'm not claiming that all annotations are wrong, just that a non-negligible part are translated.

    PMID: 20687929
    PMID: 19098097

  15. @Sam Thanks for the references, and I see now what your point is more clearly and I agree. Mis-annotation, which seems to be the main thrust of both papers, and errors in gene models is definitely a constantly recurring issue. Issues surrounding this comes up quite frequently in lab meetings in the group I work in, and came up a few times in in a recent workshop I attended on eukaryotic genome projects and annotations.

    I think, with regards to pseudogenes, there are two issues. One is that some as yet unknown percentage of computationally predicted pseudogenes are mis-annotations of potentially still functional duplicate copies.

    The other is that some percentage of pseudogenes, depending on where any potential stop codons are, likely do get translated at some low basal level spuriously. I am guessing these would be skewed towards proteins with few interacting partners otherwise I would expect more deleterious results to occur. Did you observe any of this in your proteomics data? Truncated proteins being translated?

  16. @DG -
    I've seen everything. Without resequencing to make sure that there are not errors in the underlying genome sequence, I can't be sure of what is actually happening. But I have seen both n-terminal, and c-terminal fragments be translated.

  17. Sam, DG, Rosie et al.

    Here's an idea. What about assessing the evolutionary trajectory of ALL genes in these genomes and not deciding in advance which ones are pseudogenes and which ones are not. And then one could catalog which ones get lost and then analyze other features like ds/dn, ks/ka, etc for all genes. One could then test various lists of predicted pseudogenes to see if they evolve in some way differently than any other genes that are lost. Or in other words, do all genes that get lost follow the same general path? Or is there something special about a gene that becomes a putative pseudogene first?


Most recent post

My Ode to Yolo Bypass

Gave my 1st ever talk about Yolo Bypass and my 1st ever talk about Nature Photography. Here it is ...