So there is this cool new paper out in PLoS Genetics: Evolutionary Mirages: Selection on Binding Site Composition Creates the Illusion of Conserved Grammars in Drosophila Enhancers. and I have wanted to write about it for a week or so. You see, the paper is about something I have been interested in for most of my career - how the particular processes by which mutations occur can sometimes be biased (i.e., some types of mutations are more common than others) and that these biases can create highly ordered patterns in genomes and in turn that observation of these ordered patters can sometimes be misinterpreted as being the result of adaptation. Mistaken claims of adaptation in genomics are a favorite topic of mine - and let me to create (with tongue in cheek) a new omics word - Adaptationomics.
Anyway - so I really really like this paper. But there is a week bit of a problem in writing about it. You see, it is by my brother, Michael Eisen, a Prof. at UC Berkeley (and a student in his lab Richard Lusk). And, well, I don't want to say anything wrong or stupid about the paper since, well, my brother will be pissed off. And so I have not written about it yet. But then I realized the best way to write about this one is to simply ask my brother for the "Story behind the science" for the paper, as I have been doing for some other recent papers.
If you want a summary of the paper, here it is in their own words:
Authors summary: Because mutation is a random process, most biologists assume that apparently non-random features of genome sequences must be the result of natural selection acting to create and preserve them. Where this is true, genome sequences provide a powerful means to infer aspects of molecular, cellular, and organismal biology from the signatures of selection they have left behind. However, recent analyses have shown that many aspects of genome structure and organization that have traditionally been attributed to selection can often arise from random processes. Several groups—including ours—studying the sequences that specify when and where genes should be produced have identified common, seemingly conserved, architectural features, based on which we have proposed new models for the activity of the complex molecular machines that regulate gene expression. However, in the work described here we simulate the evolution of these regulatory sequences and show that many of the features that we and others have identified can arise as a byproduct of random mutational processes and selection for other properties. This calls into question many conclusions of comparative genome analysis, and more generally highlights what Michael Lynch has called the “frailty of adaptive hypotheses” for the origins of complex genomic structures.
Conclusions: Lynch has eloquently argued that biologists are often too quick to assume that organismal and genomic complexity must arise from selection for complex structures and too slow to adopt non-adaptive hypotheses. Our results lend additional support to this view, and extend it to show that indirect and non-adaptive forces can not only produce structure, but also create an illusion that this structure is being conserved. We do not doubt that many aspects of transcriptional regulation constrain the location of transcription factor binding sites within enhancers. Indeed a large body of experimental evidence supports this notion, and we remain committed to identifying and characterizing these constraints. But if this process is to be fueled by comparative sequence analysis, as we believe it must be, it is essential that we give careful consideration to the neutral and indirect forces that we now know can produce evolutionary mirages of structure and function.I must say I love the title lead in "Evolutionary mirages" which is another but much better way of saying "Adaptationism is a bad thing".
1. Why did you do this work?
This paper started out as a control. My lab is interested in understanding how the enhancers that control gene expression work - focusing on those that control early development in Drosophila. In 2008, we published a paper showing that when we put enhancers from a distantly related family of flies into Drosophila melanogaster embryos, they drive patterns of expression that are identical to the endogenous D. melanogaster enhancers, even though they have almost no conservation of primary DNA sequence. But since they have the same function, they must have something in common - and so we compared the configurations of transcription factor binding sites in orthologous enhancers across different evolutionary timescales looking for something they shared.
What we found is that binding sites in all of these enhancers occur in clusters. They are closer to each other than one would expect if they were scattered randomly in the ~1,000 bp of an enhancer. And, what's more, sites that were close to each other were far more likely to be conserved. Surely, we thought, this could be no accident. So we proposed that enhancers are organized into compact clusters of sites for one or more factors - and that these "mini modules" are the primary unit of enhancer function.
But as we worked to extend these analyses to whole genomes, we sought a more rigorous, quantitative assessment, of just how improbably different levels of binding site clustering were. Like pretty much everyone in the field, we had used a null model in which binding sites were scattered randomly in an enhancer. But, I've been working with genomes long enough to know that nothing is ever truly random - and that all kinds of adaptive and non-adaptive processes create patterns in genome sequences that confound simple analyses. I wanted to come up with a null model for the distribution of sites within in an enhancer that was more realistic.
To do this I turned to my graduate student Rich Lusk, a card-carrying population geneticist trained at the University of Chicago. Rich was proud of his status as one of the few members of the lab who didn't work on flies - but I convinced him to put aside the abstract models of binding site evolution in yeast and work on developing a real null model for our studies of enhancer evolution.
The idea was to simulate enhancers evolving without any constraint on the organization of transcription factor binding sites they contain, and to see what happens. But this did not mean letting enhancers evolve neutrally - their extreme functional conservation demonstrates that they are under fairly strong constraint. Since it is pretty clear that these enhancers are responding to the same transcription factors in all of these species, Rich's simulations required that enhancers maintain their binding site composition - but placed no constraints on how the sites were organized relative to each other.
And what we found was striking. Even with no explicit selection on binding site organization - these evolved enhancers had lots of structure! Binding sites were clustered together, and, the closer together sites were, the more conserved they were -- just like they were in real enhancers. In made us realize pretty quickly that the patterns we had latched onto - and which many other people were describing in different systems - might not be an evolutionary signature contraint on the organization of sites within in enhancers, but simply a byproduct of selection on binding site composition. If you want details, read the paper! But this has radically altered the way that we look at enhancer evolution.
2. How did you come up with the title.
Rich and I were writing the paper, and we had some really long, hideous, boring title. In writing the paper, the idea that things are not always what they appear to be was at the forefront of my mind. I was thinking about how desperate we and other people in the field were to figure out how enhancers work - it's a vexing problem that has defied decades of work - and how we all hoped that evolutionary analysis was going to rescue us - and how quickly and eagerly we latched on to the first signs of a signal - and how that was just like a mirage you see in the desert....3. Any interesting background?
(see 1)4. When did the work start?
About a year ago. We had been thinking about this for a while, but only when Rich focused on it did things get rolling.
5. Why PLoS Genetics? Did PLoS Biology reject it?
PLoS Genetics was our first choice. PG has become the premier journal for evolutionary genetics - it routinely publishes the most interesting and important work in the field, and everyone reads it. While every paper I've sent there has been heavily scrutinized, the editorial process has been fair (though sometimes agonizingly slow....), and each review has been thoughtful and many (including in this case) helped to vastly improve the paper.
Lusk, R., & Eisen, M. (2010). Evolutionary Mirages: Selection on Binding Site Composition Creates the Illusion of Conserved Grammars in Drosophila Enhancers PLoS Genetics, 6 (1) DOI: 10.1371/journal.pgen.1000829