The Tree of Life: December 2009

Thursday, December 31, 2009

Evolutionary classification applied to soups & meals

Written by Saul Jacobson - I think to mock me but you never know with him:

Molecular comparisons show that soup on this planet divides into three primary groupings, commonly known as the stews, the broths, and the chowders. The three are very dissimilar, the differences that separate them being of a more profound nature than the differences that separate typical meals, such as steaks and salads. Unfortunately, neither of the conventionally accepted views of the natural relationships among meals -i.e., the five-kingdom taxonomy or the appetizer/dinner dichotomy--reflects this primary tripartite division of the eating world. To remedy this situation we propose that a formal system of meals be established in which, above the level of eating, there exists a new taxon called a "course." Meals on this planet would then be seen as comprising three courses, the appetizer, the main course, and the dessert, each containing two or more plates

Good last minute place to donate $$ - Explorit Science Center in Davis

Just got an email saying that the Explorit Children's Science Center in Davis is hurting for funding. See below - please consider donating to this worthy cause ...

Cash-strapped Explorit appeals to community for support

For 28 years, Explorit Science Center has been a gem of the Davis community and an educational boon to the Sacramento region, providing hands-on science opportunities for thousands of families and schools.

Now, the center is confronting a challenge that science can't address. Financial struggles are threatening the existence of Explorit's main site at 2801 Second St., which was purchased in 2006.
In response, the center is appealing to the community for support.

"We know this is a philanthropic community, and one that strongly believes in furthering the education of our children," says Lou Ziskind, executive director. "We need help to keep the Explorit mission alive."

Ziskind attributed the financial difficulties to the down-turned economy and unfortunate timing in the center's purchase of the Second Street site.

"Government funding for education is down, corporate donations and grants are scarce, and understandably, individuals just cannot contribute like they could a couple years back," he said. "Income from program fees, also a key revenue source, have shrunk as well."

"The bigger concern, however, is that in 2006, when the economy seemed sound and real estate values were rising, we obtained bridge financing to complete the purchase of our building," Ziskind said. "With a drop-off in income, we are struggling with the repayment terms of our $1.6 million loan. Without additional financial support, our ability to run our Second Street facility will be in jeopardy."

Ironically, the museum is thriving in many ways. It is breaking attendance records -- 2,391 visitors in November was a 40 percent increase over November 2008 -- and visitor evaluations have been very positive.

"We've finished our phased-in opening of the museum so there is more fun stuff to do than ever," Ziskind said. "We've also been getting great media attention with new attractions like our streambed table and new promotions like "Toddler Tuesday."

Explorit's Board of Directors and staff believe that the need for the center's informal science education programs is greater than ever.

"In Davis, we're fortunate to have a school budget that allows for science education in elementary classrooms," said Betsy Elzufon, an Explorit trustee and mother of two. "Sadly, in many other towns, the students' only hands-on science experiences are those provided by Explorit's traveling programs. For those schools, it is truly a blessing to have us help fill the void and educate these students about the importance science lends to our daily lives."

To keep the Explorit vans rolling and the museum open, Explorit offers a package of ways to lend support, says Peter Willson, the development director. The organization is a nonprofit 501 (c)(3) organization; donations are tax-deductible.

"We have a 'Donate' button on our website, of course," he said, referring towww.explorit.org. "That's the easiest and fastest. We also gladly take cash or check donations at the museum during visiting hours or through the mail (to Explorit at P.O. Box 1288, Davis CA 95617). Then our website has lots of other information about ways to help, including a new car-donation program."

He and board members are hopeful that the center will weather the recession.

"This place has been a great part of the community for over a quarter-century and has meant so much to the families in Davis and the greater Sacramento region," Elzufon said. "It would be a sad day if we were not able to continue offering our many valuable programs to the many families, schools, and educators who depend on us."

For more information about Explorit, please contact Peter Willson at (530) 757-4530 ext. 112 or e-mail him at PeterW@explorit.org.

Top days of the week for 2009 stunner: Tuesday #1, Friday last place

Well, the results of our detailed survey are in. With the help of people from all around the world, we have classified and rated days of the week for 2009 based upon a large number of criteria. A really large number of criteria. And the rankings are:

Tuesday. In a big surprise, Tuesday won out over the other days. Our analysis indicates that this was due more to dislike of other days rather than the popularity of Tuesday itself. In particular, the problems this year with Friday not only took down Friday, but also Thursday (see below). In addition, Tuesday allied itself early on with the always popular Saturday when Saturday broke with Friday over the furlough issue. And Tuesday always scores well with the Monday haters, because when it comes, that is proof they made it through their nemesis. Congratulations Tuesday.
Saturday. Riding on Tuesday's coattails and it's own strength as the first day of the weekend, Saturday moved up from last years poor performance to snag a second place. Clearly the break with Friday paid off as apparently did Saturday's decision to not run any negative campaigns against Sunday as it did last year with it's "Sunday, the day before Monday" campaign.
Sunday. Despite a general impression of being dull and boring, Sunday this year snagged it's traditional third slot because people viewed it as a safe bet and it still won out the traditional "weekend" voters. Also - Sunday negated the negative ads from last year by hiring a skilled rapid response team that posted humorous YouTube videos mocking any negative ads that came out.
Wednesday. Rolling in at fourth position is Wednesday. Never popular. Never disliked. But always right in the middle. Normally in competition with Tuesday and Thursday for the fourth spot, Wednesday beat out Thursday by a wide wide margin this year, most likely due to the Friday issue but simply could not compete with Tuesday's excellent campaign.
Thursday. What a downturn for Thursday. Last year third. This year the bottom of the middle in a deep fifth place. Sadly, this was something out of Thursday's control as the enormous unpopularity of Friday simply killed Thursday's campaigning. Many see this a a fair turn of events, as Thursday previously has ridden its position near the end of the work week to popularity despite not really doing much of anything special.
Monday. The big shocker this year. Monday moved up from it's traditional seventh spot to get a #6. Clearly, this was in a large part due to some issues with Friday but also Monday ran a very smooth campaign focusing on it's strengths (e.g., many Holidays) and acknowledging rather than ignoring its weaknesses. Monday has publicly stated that it is aiming for #5 next year, at whatever the cost. Our prediction - not going to happen.
Friday. From first to last, in one year. Friday won last year and in many previous years where it usually duked it out with Saturday for top spot. But this year Friday was killed by furloughs. Not only did the people who were furloughed on Friday vote against it, but so did all the people who had to wait in line at DMV's and other government offices. In addition, Saturday and Thursday teamed up for a brilliant ad campaign pointing out that just about everyone knew someone who was furloughed, and they suggested early in the year that it would be immoral to even "feel" happy on Friday. This campaign killed Friday's chances early on. And with the financial issues continuing it looks like Friday may have trouble next year too.

Tuesday, December 29, 2009

More coverage of the GEBA "Phylogeny Driven Genomic Encyclopedia"

Just a quick note here to post some links to additional stories about my new paper on "A phylogeny driven genomic encyclopedia of bacteria and archaea" which was published last week in Nature (with a Creative Commons license - which is rare in Nature but is what they use for genome sequencing papers).

Carl Zimmer has an article today in the New York Times "Scientists Start a Genomic Catalog of Earth’s Abundant Microbes" about the paper and the project. In the article he interviews me and Hans-Peter Klenk, who was a co-author and led the culturing part of the project. A few things to note about this:

It is rare to have archaea mentioned in the New York Times.
There is a tree that goes along with the article which is a modified version of the tree we had in our paper. I think theirs is very nice. Kudos to their artist
There is a quote by Norm Pace generally supportive of the project
The article mentions the JGI Adopt a Microbe program and even has a shout out to Malcolm Campbell at Davidson College and his recent PLoS One paper where they discuss results from a project where they took one of the genomes from our project and used it as part of a course on genome annotation/analysis.

For some of the story behind the paper see my recent blog post "Story Behind the Nature Paper on 'A phylogeny driven genomic encyclopedia of bacteria & archaea' #genomics #evolution"

Other discussions worth checking out

John Timmer's article for Ars Technica on "Presenting a genomic encyclopedia of bacteria (and archaea)"
The Department of Energy is featuring the project as part of their "National Impact" Series" Scientists Launch the Genomic Encyclopedia of Bacteria and Archaea
NYTimes Science Times discussion from Charlie Petit at the Knight Science Journalism tracker

Also see

Archaea Make the Big Time from Genome Technology
Woodland Daily Democrat (local paper): Encyclopedia of microbe genomes released
Cory Golden at The Davis Enterprise wrote a nice story "Researchers urge new take on microbes" - not sure how long this stays online or how people access it though
Microbe World has a bit on it
Leonardo Martin has a really nice round up here
The ScientificBlogging staff have written a bit about it here
R&D mag Sr Editor Paul Livingstone has an interesting take on the story: Obsessive compulsive taxonomy
Green Car Congress with mostly material from the press releases here.
MyCor Web has a nice discussion of the paper

Wu, D., Hugenholtz, P., Mavromatis, K., Pukall, R., Dalin, E., Ivanova, N., Kunin, V., Goodwin, L., Wu, M., Tindall, B., Hooper, S., Pati, A., Lykidis, A., Spring, S., Anderson, I., D’haeseleer, P., Zemla, A., Singer, M., Lapidus, A., Nolan, M., Copeland, A., Han, C., Chen, F., Cheng, J., Lucas, S., Kerfeld, C., Lang, E., Gronow, S., Chain, P., Bruce, D., Rubin, E., Kyrpides, N., Klenk, H., & Eisen, J. (2009). A phylogeny-driven genomic encyclopaedia of Bacteria and Archaea Nature, 462 (7276), 1056-1060 DOI: 10.1038/nature08656

Bakke, P., Carney, N., DeLoache, W., Gearing, M., Ingvorsen, K., Lotz, M., McNair, J., Penumetcha, P., Simpson, S., Voss, L., Win, M., Heyer, L., & Campbell, A. (2009). Evaluation of Three Automated Genome Annotations for Halorhabdus utahensis PLoS ONE, 4 (7) DOI: 10.1371/journal.pone.0006291

Thursday, December 24, 2009

Story Behind the Nature Paper on 'A phylogeny driven genomic encyclopedia of bacteria & archaea' #genomics #evolution

Today is a fun day for me. A paper on which I am the senior author is being published in Nature (yes, the Academic Editor in Chief of PLoS Biology is publishing a paper in Nature, more on that below ..). This paper, entitled, "A phylogeny driven genomic encyclopedia of bacteria and archaea" represents a culmination of years of work by many people from multiple institutions. Today in this blog I am going to do my best to tell the story behind the paper - about the people and the process and a little bit about the science.

First, a brief bit about the science in the paper. In this paper, we (mostly people at the Joint Genome Institute, where I have an Adjunct Appointment -- but also people in my lab at UC Davis and at the DSMZ culture collection) did a relatively simple thing - we started with the rRNA tree of life as a guide. Then we identified branches in the bacterial and archaeal portions of this tree where there were no genome sequences available (or in progress) (this was done mostly by Phil Hugenholtz, Dongying Wu and Nikos Kyrpides) Next we searched for representatives of these "unsequenced" branches in the DSMZ culture collection (a collection of bacteria and archaea that can be grown in the lab). And we identified in total some 200 of these. And then the DSMZ (under the direction of Hans-Peter Klenk) grew these organisms and sent the DNA to the Joint Genome Institute. And then JGI turned on their genome sequencing muscle and sequenced the genomes of the organisms in the DNA samples. And finally, we spent a good deal of time then analyzing the data asking a pretty simple question - are there any general benefits that come from this "phylogeny driven" approach to sequencing genomes compared to what one might find with sequencing just any random genome (after all, any genome sequence could have some value)? The paper, describes what we found, which is that there are in fact many benefits that come from sequencing genomes from branches in the tree for which genomes are not available.

More on the details of the science below. But first, I want to note that this paper was truly an amazing team effort, with all sorts of people from the JGI in particular, going above and beyond the call of duty to make sure it happened and worked well. And the Department of Energy has been truly phenomenal in my opinion in supporting this project which in the end is not explicitly about "energy" per se but is really about providing a reference set of genomes that should improve the value of all microbial genome data.

Anyway, now for the story behind the story. And be prepared, because this is a bit long. But I think it is important to place this work in a bigger context both in terms of my background as well as some of the background of other people in the project. If you can't wait for more on the GEBA project then perhaps you should go to some of these links:

Videos of talks I have given on the project:

Podcast of interview of me for ASM's Meet the scientist
Stories about GEBA

Nature News from 11.17.2009

Stories about our paper

Nature News
GenomeWeb "GEBA Researchers Publish Results from Dozens of Bacterial, Archaeal Genomes"
Ars Technica article "Presenting a genomic encyclopedia of bacteria (and archaea" by John Timmer
Iddo Friedberg blogged about it
The OpenHelix Blog on it
Leonardo Martins blogs about it here and helps translate a Spanish story about the project
R&D magazine has a post based on the press releases here
NY Times story by Carl Zimmer here.

FriendFeed Discussions here (includes a thread about Nature using a Creative Commons license)

And I will post more links as they come up. Below what I try to provide is some of the story behind the story:

My personal interest in applied uses of phylogenetics stage 1: undergraduate preparation at Harvard
As this paper is primarily about an applied use of phylogenetics (in selecting genomes for sequencing), I thought it would be worth going into some of how I personally became a bit obsessed with applied uses of phylogenetics. For me, my obsession began as an undergraduate at Harvard where I got exposed to the value of phylogeny as a tool from many many angles including but not limited to:

Freshman year taking a course taught by Stephen Jay Gould where Wayne and David Maddison were Teaching Assistant's and where they were demoing their new phylogenetics software called MacClade
Sophomore year taking a conservation biology class with Eric Fajer and Scott Melvin where I was exposed to the concept of "phylogenetic diversity" as a tool in assessing conservation plans
Junior year working in the lab of Fakhri Bazzaz with people like David Ackerly and Peter Wayne who made use of phylogeny as a key tool in their research projects
Senior year and the year after graduating where I worked in the lab of Colleen Cavanaugh using rRNA based phylogenetic analysis to characterize uncultured chemosynthetic symbionts. I note it was in Colleen's lab that I also became obsessed you could say with microbes and why they rock.

My personal interest in applied uses of phylogenetics stage 2: graduate school at Stanford

All of this and more gave me a strong passion for phylogeny as a tool. And so when I went to graduate school at Stanford (originally to work with Ward Watt on butterflies, but then I switched to working in Phil Hanawalt's lab on the "Evolution of DNA repair genes, proteins and processes"). And while in that lab I become pretty much obsessed with three things, all related to phylogeny.

First, I was interested in whether the rRNA tree of life, which I had used in my studies in Colleen Cavanaugh's lab (and in my first paper in J. Bacteriology, which, thanks to ASM, is now in Pubmed Central and free at ASM's site too), was robust or, as some critics argued, was not that useful. This was a critical question since the best way to study the phylogeny of microbes at the time, and also the best way to study uncultured microbes, was to leverage the ability to clone rRNA genes by PCR and then to build evolutionary trees of those rRNA genes. As part of my graduate work, I did a study where I compared the phylogenetic trees of rRNA to trees of another gene from the same species (I chose, recA). Surprisingly, despite the claims that the rRNA tree was not very useful and that different genes always gave different trees, if you compared the two trees ONLY where there was strong support for a particular branching pattern, the trees of the two genes were in fact VERY VERY similar (a finding that had been suggested previously by others, including Lloyd and Sharp)

Second, I also became obsessed with the fact that most of the experimental studies of DNA repair processes were in a very narrow sampling of the phylogenetic diversity of organisms (e.g., at the time, no studies had been done in archaea, and most studies in bacteria were from only two of the many major groups). So I started experimental studies of repair in halophilic archaea in order to help broaden the diversity of studies. And I began to use PCR to try and clone out repair genes from various other species of diverse bacteria and archaea. Alas, as I was doing this, some institute called TIGR was sequencing the complete genomes of organisms I was trying to clone out single genes from. In fact, one of the first organisms I was working on for PCR studies was Archaeoglobus fulgidus. And when I found out TIGR was sequencing the genome, in a project led by non other than the great microbial evolutionary biologist Hans-Peter Klenk (yes, the same one who helped us in this GEBA project). I decided it was silly to try to clone out individual genes by PCR. And thus I began to learn how to analyze genomes.

It was in the course of learning how to analyze genomes that I came up with another applied use of phylogeny. I realized that one should be able to use phylogenetic studies of genes to help in predicting functions for uncharacterized genes as part of genome annotation efforts. And so I wrote a series of papers showing that this in fact worked (I did this first for the SNF2 family of proteins and then alas coined my own omics word "phylogenomics" to describe this integration of genome analysis and phylogenetics and formalized this phylogenomic approach to functional prediction). I note that what I was arguing for was that protein function could be treated like ANY other character trait and one could use character trait reconstruction methods (which I had learned about while playing with that MacClade program) to infer protein functions for unknown proteins in a protein tree. I note that this notion of predicting protein function from a protein tree is completely analogous to (and one could rightfully say stolen from) how researchers studying uncultured microbes were trying to infer properties of microbes from the position of their rRNA genes in the rRNA tree of life (as I had learned in studies of symbioses).

My personal interest in applied uses of phylogenetics stage 3: my plans for a post doc

So as I was wrapping up graduate school I was seeking a way to go beyond what I was doing and combine studies of DNA repair and evolution and microbiology in another way. And I thought I had found a perfect one in a post doc I accepted with A. John Clark at U. C. Berkeley. John was the person who had discovered recA, the gene I had been using for phylogenetic analysis and for structure function studies. And he was working with none other than Norm Pace and a young hotshot in Norm's lab, Phil Hugenholtz (as well as a few others including Steve Sandler) in trying to use the recA homolog in archaea as a marker for environmental studies of archaea . It sounded literally perfect. And so I was excited to start this job. That was, until I met Craig Venter.

Grabbing the TIGR by the tail

While I had been playing around with data from TIGR in the latter years of my time in graduate school, I also got involved in teaching a fascinating class with David Botstein, Rick Myers, David Cox and others. (As an aside, this class was part of a new initiative I helped design at Stanford on "Science, Math and Engineering" for non science majors - an initiative that was a pet project of non other than Condie Rice who was Provost at the time). Anyway, Rick Myers was serving as a host for one Craig Venter when he came and gave a talk at Stanford and somehow I managed to finagle my way into being invited to go out to dinner with Craig. And at dinner, I proceeded to tell Craig that I thought some of the evolution stuff he was talking about was bogus and I tried to explain some of my work on phylogeny and phylogenomics. Not sure what Craig thought of the cocky PhD student drawing evolutionary trees on napkins, but it eventually got me a faculty job at TIGR and I worked extensively with Craig so it must have been worth something. And so I and my fiancé Maria-Inés Benito (now wife ...) moved to Maryland and spent eight great years there (my wife started in MD as a faculty member at TIGR too, but then she left to go to a company called Informax, may it rest in peace).

Most of my work at TIGR focused on a different side of phylogenomics than represented in the GEBA project. At TIGR I focused on the uses of evolutionary analysis as a component to analyzing genomes - from predicting gene function to finding duplications (e.g., see the V. cholerae genome paper) to identifying genes under unusual patterns of mutation or selection to finding organelle derived genes in nuclear genomes (e.g., see this) to studying the occurrence of lateral gene transfer or the lack of occurrence of it to studying genome rearrangement processes.. And sure, every once in a while I worked on a project where the organism was the first in its major branch to have a genome sequenced (e.g., Chlorobi). And I had noted, along with others that there was a big phylogenetic bias in genome sequencing project (e.g., see my 2000 review paper discussing this here).

But that did not really drive my thinking about what genome to actually sequence until TIGR hired a brilliant microbial systematics expert Naomi Ward as a new faculty member. And it was Naomi who kept emphasizing that someone should go about targeting the "undersequenced" groups in the Tree of Life.

NSF Assembling the Tree of Life grant
And so Naomi and I (w/ Karen Nelson and Frank Robb) put together a grant for the NSF's "Assembling the Tree of Life" program to do just this - to sequence the first genomes from eight phyla of bacteria for which no genomes were available but for which there were cultured organisms. Amazingly we got the grant. And we did some pretty cool things on that project, including sequencing some interesting genomes, and developing some useful new tools for analyzing genomes (e.g., STAP, AMPHORA, APIS). And I was able to hire some amazing scientists to work in my lab on the project including Dongying Wu (the lead author on the GEBA paper) and Martin Wu (who also worked on the GEBA project and is now a Prof. at U. Virginia) and Jonathan Badger. Alas, we did not publish any earth shattering papers as part of this NSF Tree of Life project on analyzing the genomes of these eight organisms, not because there was not some interesting stuff there but for some other reasons. First, I moved to UC Davis and there was a complicated administrative nightmare in transferring money and getting things up and running at Davis on this project so my lab was not really able to work on it for two years (in retrospect, what a f*ING nightmare dealing with the UC Davis grants system was ...).

Then, just as things we ready to get restarted, TIGR kind of imploded and many of the people, including Naomi, my CoPI, left (though I note, my moving to Davis was unrelated to the dissolution of TIGR). But perhaps most importantly, there were some actual technical and scientific problems with our dreams of changing the world of microbiology from our phyla sampling project - the science was not quite there. In particular, having a single genome from each of these phyla was simply not enough to get (and show) the benefits that can come from improved sampling of the tree of life. And thus though we have published some cool papers from this project (e.g., see this PLoS One paper on one of the genomes), we all ended up in one way or another, disappointed with the final results.

Davis and JGI: the return of phylogeny to genomic sampling

When I moved to UC Davis I also was offered (and accepted) an Adjunct Appointment at the Joint Genome Institute (JGI). At both places, I envisioned reinventing myself as someone who worked on studying microbes directly in the environment (e.g., with metagenomics) and symbioses (both of which I had started on at TIGR). And in fact, in a way, I have done this, since I got some medium to big grants to work on these issues. I tried diligently to attend weekly meetings at the JGI but it was difficult since technically I was 100% time at UC Davis and was in essence supposed to be at 0% time at JGI. And when JGI hired Phil Hugenholtz to run their environmental genomics/metagenomics work, I was needed less at JGI since, well, Phil was so good. It was great to go over there and interact with Eddy Rubin, Phil Hugenholtz, and Nikos Kyrpides, among others, but it was unclear what exactly I would do there with Phil running the metagenomics show.

And then, like magic, something came up. I went to one of the monthly senior staff meetings at JGI and while we were listening to someone on the speaker phone, Eddy Rubin handed me a note asking me what I thought about the proposal someone was making to sequence all the species in the Bergey's Manual. And the light bulb of phylogeny went back on in my head. I told him (I think I wrote it down, but may have said out loud), something like "well, sequencing all 6000 or so species would be great, but it would be better to focus on the most phylogenetically novel ones first." And in a way, GEBA was born. Eddy organized some meetings at JGI to discuss the Bergey's proposal and I argued for a more phylogeny driven approach. And this is where having Phil Hugenholtz and Nikos Kyrpides at JGI was like a perfect storm. You see, both had been lamenting the limited phylogenetic coverage of genomes for years, just like I had. Phil had even written a paper about it in 2002 which we used as part of our NSF Tree of Life proposal. And Nikos too had been diligently working for years to make sure novel organisms were sequenced. So though we went to a meeting to discuss the Bergey's manual idea, we instead proposed an alternative - GEBA.

And for some months, we pitched this notion to various people including at JGI, DOE, and various advisory boards. And the response was basically - "OK - sounds like it COULD be worth doing - why don't you do a pilot and TEST if it is worth doing" And so, with support from Eddy Rubin and DOE, that is what we did.

One key limitation - getting DNA

So Phil, Nikos and I and a variety of others starting working on the general plan behind GEBA. But there was one key limitation. How were we going to get DNA from all these organisms? One possibility was to seek out diverse people in the community and have them somehow help us. This had some serious problems associated with it, not the least of which was the worry that we might end up sequencing varieties of organisms that people had in their lab but which nobody else had access to (something Naomi Ward and I had written about as a problem a few years before).

And here came the second perfect storm - none other than Hans-Peter Klenk (yes, the same one who had led some of the early genome sequencing efforts when he was at TIGR), was visiting JGI. And he had a relatively new job - at the German Culture Collection DSMZ (In fact, I should note, I had tried to get a job at TIGR even before I met Venter, since they had a position advertised for a "microbial evolutionary biologist" --- but that job went to Klenk). Phil Hugenholtz had asked the Head of DSMZ, Erko Stackebrandt, if they might be interested in helping us grow strains and get DNA but we did not yet have a full collaboration with them. And Erko had suggested we contact Hans-Peter. And in his visit to JGI it became apparent that he would do whatever he could to help us build a collaboration with DSMZ. And thus we had a source of DNA. Even more amazingly to me, they did it all for free.

GEBA begins

And thus began the real work in the project. Phil used his expertise with rRNA databases, especially GreenGenes, to pull out phylogenetic trees of different groups. And Nikos used his expertise as the curator of a database on microbial sequencing projects (called GenomesOnline) to help tag which branches in Phil's tree had sequenced genomes or ones in progress. And then they looked for whether any of the members of the unsequenced branches had representatives in the DSMZ collection. And with some help from Dongying Wu and me, we came up with a list. And with the help of the JGI "Project Management" team including David Bruce and Lynne Goodwin and Eileen Dalin and others at JGI we developed a protocol for collaborating with DSMZ and getting DNA from them.

And I became the chief cheerleader and administrator of the project, in part since Phil and Nikos were so busy with their other things at JGI. And though I was not always on the ball, the project moved forward and we started to get genomes sequenced using the full strength of the JGI as a genome center. The finishing teams at JGI worked diligently on finishing as many of the genomes as possible. And Nikos' team at JGI made sure that the genomes were annotated. And I helped make sure that they data release policies were broadly open (which everyone at JGI supported). And after many false starts with papers on the project that were way way way to cumbersome and big, with some kicks in the pants from the director of JGI Eddy Rubin who was getting anxious about the project, we turned out the GEBA paper that was published today in Nature.

You might ask, why, as a PLoS official and PLoS cheerleader, we ended up having a paper in Nature? Well, in the end, though I am senior author on the paper, the total contribution to the work mostly came from people at JGI who did not work for me but instead worked with me on this great project. And we took some votes and had some discussions and in the end, despite my lobbying to send it to PLoS Biology, submitting it to Nature was the group decision. I supported this decision in part due to the fact that Nature uses a Creative Commons license for genome papers. But I also supported it because in the end, this was a collaboration involving many many many people and in such projects everyone has to compromise here and there. Now mind you, I am not sad to have a paper in Nature. But I would personally have preferred to have it in a journal that was fully open access, not just occasionally open like Nature.

Now I note, there were a million other things that went on associated with the GEBA project. Some of which I was not even involved in in any way. I will try to write about some of these another time, but this post is already way way way too long. So I am going to just stop here and add that I have been honored and lucky work with people like Phil, Nikos, Hans-Peter, and others on this project and to have the people at the JGI work so hard on the background parts of this project. Thanks to all of them and to the people at DSMZ and in my lab who helped out and to the DOE for funding this work (as well as the Gordon and Betty Moore Foundation, who funded some of the work from my lab on analysis of these genomes). And last but not least, thanks to the Director of JGI Eddy Rubin, supporting this project and for being patient with it and for kicking us in the pants when we needed to get moving on getting a paper out.

Wu, D., Hugenholtz, P., Mavromatis, K., Pukall, R., Dalin, E., Ivanova, N., Kunin, V., Goodwin, L., Wu, M., Tindall, B., Hooper, S., Pati, A., Lykidis, A., Spring, S., Anderson, I., D’haeseleer, P., Zemla, A., Singer, M., Lapidus, A., Nolan, M., Copeland, A., Han, C., Chen, F., Cheng, J., Lucas, S., Kerfeld, C., Lang, E., Gronow, S., Chain, P., Bruce, D., Rubin, E., Kyrpides, N., Klenk, H., & Eisen, J. (2009). A phylogeny-driven genomic encyclopaedia of Bacteria and Archaea Nature, 462 (7276), 1056-1060 DOI: 10.1038/nature08656

Tuesday, December 22, 2009

Story behind the story for new #PLoSOne paper on Bayesian phylogenetics

There is an interesting new paper in PLoS One" Long-Branch Attraction Bias and Inconsistency in Bayesian Phylogenetics" by Brian Kolaczkowski and Joseph Thornton. The work focuses on methods for inferring phylogenetic history and in particular two types of statistical approaches: Likelihood and Bayesian. These methods are related to each other in that both attempt to use statistical models of evolution and then test different possible phylogenetic trees related taxa by how well certain data sets about those taxa map into the different possible trees. What they did in this new paper was test, with some simulations, and with some mathematical analyses. And somewhat surprisingly, they find that Bayesian methods, which have become more popular recently, appear to be more prone to errors than likelihood methods, when the data sets have multiple not closely related taxa with long branches. (Note if you want to learn more about phylogenetic methods, you can look at the online chapter (html format or PDF) from my Evolution Textbook, though I confess this needs a bit of revision, which I am working on now).

What they see in these cases is that the taxa with long branches group together, something known generally as "Long Branch Attraction" (LBA). Though there have been many previous studies of LBA, most have ended up showing that statistical methods are less prone to this problem than other phylogenetic methods, like distance and parsimony methods. What is surprising in this new work in that they find that Bayesian methods are highly prone to LBA - and much more so than likelihood methods.

Anyway, for more on this one could read the paper. But that I thought might be interesting is to ask the authors for more detail directly. I am hoping to do this more and more with PLoS papers in the future. I was inspired to do this, in fact, by one of the authors of this paper, Joe Thornton. He sent me an email with a link to the paper saying he thought I might be interested in it (true) and that he felt that it was his job in part for a PLoS One paper to make sure it got read by the right audience so he was hoping I might blog about it. And I said sure, but only if he gave me some of the "story behind the story". So here it is below:

Why did you do these experiments?

Why did we do these experiments? A few years ago, we were studying the behavior of Bayesian posterior probabilities on clades -- whether or not they accurately predict the probability that a clade is true, and what kinds of conditions might cause them to deviate from this ideal. We found that when the true tree was in the Felsenstein zone (two non-sister long branches separated by short branches), the long branches were often incorrectly grouped together with strong support. This was just a small part of a much larger paper that was published in MBE in 2008. The suggestion that Bayesian inference (BI) might be biased in favor of a false tree was surprising and intriguing, because we -- like most people in the field -- had assumed that BI would have the desirable statistical properties of ML (e.g., nearly unbiased inference and statistical consistency -- convergence on the true tree with increasing support as the amount of data grows and the evolutionary model is correct, etc.). So we began doing experiments to rigorously explore the nature of the bias and its causes. When we found that BI was statistically inconsistent and the cause was integrating over branch lengths, we knew this result would be controversial, so we wanted to be sure the experiments were truly airtight. We supplemented our initial simulations with analyses of empirical data, with simulations under a wide variety of conditions using all types of priors, as well as mathematical and numerical analyses to clearly demonstrate the reasons for the bias. We also developed software that was identical to fully Bayesian MCMC except that it does not integrate over branch lengths; this method is not subject to the bias that BI displays, clearly demonstrating the cause of the bias.

Why did you send this to PLoS One?

Why did we submit to PLoS One? We think this paper has profound implications for phylogenetic practice and theory, and we want it to have a wide audience. Our experience with the review process in phylogenetic methods, unfortunately, is that many reviewers evaluate manuscripts based on whether or not the results confirm their world-view. This is a legacy of decades of internecine warfare in the field between the adherents of different methodological camps. We write papers in other fields, and while peer-review always has its ups and downs, our experience in phylogenetics is unusual in that solid papers are often rejected for philosophical reasons rather than for reasons of scientific validity and quality. We know this paper will be controversial, and we didn't want it to be shot down in the review process for partisan reasons. PLoS One seemed like the perfect place to get the paper out and let the scientific community evaluate whether the experiments are convincing or not.

This is our first time publishing in PLoS One. I confess to being a little bit anxious that the paper will be lost in the great tide of papers published in the journal. We know our paper is very strong -- I think it's perhaps the most convincing and complete analysis of any problem I've ever published -- so we're confident that the work can have an impact, as long as the attention of readers in the field is drawn to it.

Where is the other author these days?

Bryan is now a postdoc in Andy Kern's lab at Dartmouth.

Kolaczkowski, B., & Thornton, J. (2009). Long-Branch Attraction Bias and Inconsistency in Bayesian Phylogenetics PLoS ONE, 4 (12) DOI: 10.1371/journal.pone.0007891

ARPA-E funding opportunities in transformational energy research projects

An announcement of possible interest:

DOE has announced a second round of funding opportunities with $100 million in Recovery Act funding to be made available for transformational energy research projects through Advanced Research Projects Agency-Energy (ARPA-E). Concept papers limited to 5-6 pages (depending on individual areas of focus) are due January 15, 2010. Awards may range from $500K - $10M (average award $1M - $5M) with a performance period of 24 - 36 months. Full proposals will be by invitation and will be due 31 days after notification.

Areas of focus included under this announcement:

Electrofuels. (DE-FOA-0000206). ARPA-E is seeking new ways to make liquid transportation fuels - without using petroleum or biomass - by using microorganisms to harness chemical or electrical energy to convert carbon dioxide into liquid fuels.
Innovative Materials & Processes for Advanced Carbon Capture Technologies (IMPACCT). (DE-FOA-0000208) The objective of this topic is to fund high risk, high reward research efforts that will revolutionize technologies that capture carbon dioxide from coal-fired power plants, thereby preventing release into the atmosphere.
Batteries for Electrical Energy Storage in Transportation (BEEST). (DE-FOA-0000207). In this topic, ARPA-E seeks to develop a new generation of ultra-high energy density, low-cost battery technologies for long-range, plug-in, hybrid electric vehicles and electric vehicles (EVs).

Relevant Dates

Concept Paper Registration Deadline: January 15, 2010
Concept Paper Upload Deadline: January 15, 2010 at 5:00pm (EST) (2:00pm PST)*
Full Application Submission Deadline: TBD
*Submitting Division must register in advance at, and submit electronically to ARPA-E eXCHANGE.

Relevant Links

ARPA-E Home: http://arpa-e.energy.gov/
Funding Opportunities: https://arpa-e-foa.energy.gov/
Frequently Asked Questions: https://arpa-e-foa.energy.gov/FAQ.aspx
Key Documents: http://arpa-e.energy.gov/keydocs.html
Direct questions to: ARPA-E@doe.gov

Monday, December 21, 2009

Creative Commons Licenses adopted at Palo Alto High School

Cool - Creative Commons spreading even to Palo Alto High School - See Paly Voice - Creative Commons Spotlight. According to the article, multiple Palo Alto High publications have adopted CC licenses and are the first high school publications to do so. Good call I say. Plus check out the article which discusses other diverse uses of CC including Nine Inch Nails, PLoS, Wikipedia, and others. Of course, this might have something to do with Lawrence Lessig being from the neighborhood, but that's OK by me.

Saturday, December 19, 2009

US government seeks input on Open Access policies

Quick one here. For all interested in Open Access. Below are some excerpts from an email I received from the folks at PLoS Computational Biology. The main point: the White House Office of Science and Technology Policy is seeking input on broadening public access to publically funded research ...

The White House Office of Science and Technology Policy has recently invited comment on broadening public access to publicly funded research and they want to hear from you. Contributions may be posted to their blog at: http://blog.ostp.gov/2009/12/10/policy-forum-on-public-access-to-federally-funded-research-implementation/

Their Request for Information (RFI) lasts for just 30 days and expires on 7 January 2010, so we'd like to inform you about this important effort and encourage you to get involved in the discussion. This is an opportunity for us to shape a broader public access policy - how it should be implemented, what type of technology and features are needed, and how to manage it.

There are 3 main topics where the administration would appreciate your input (they also welcome general comments) and each one is open for a set period of time:

1. Implementation - expires 20 December 2009 (i.e. on Sunday). Which Federal agencies are good candidates to adopt Public Access policies? What variables (field of science, proportion of research funded by public or private entities, etc.) should affect how public access is implemented at various agencies, including the maximum length of time between publication and public release?

2. Features and Technology - 21-31 December 2009. In what format should the data be submitted in order to make it easy to search and retrieve information, and to make it easy for others to link to it? Are there existing digital standards for archiving and interoperability to maximize public benefit? How are these anticipated to change?

3. Management - 1-7 January 2010. What are the best mechanisms to ensure compliance? What would be the best metrics of success? What are the best examples of usability in the private sector (both domestic and international)? Should those who access papers be given the opportunity to comment or provide feedback?

Hat tip to Karla Heidelberg, Carl Beottiger, and many others who emailed me about this to suggest I post something ...

Related things worth looking at:

Federal register announcement about this
Slashdot story on this topic
Alliance for Taxpayer Access
Peter Suber's Open Access News on the topic
Bora on the topic

Friday, December 18, 2009

#OpenAccess help needed - best way to publish conference proceedings in an OA manner?

To all Open Access fans or gurus out there. I am writing at the request of a colleague who is looking into ways that one might switch from publishing papers for a conference from a closed access way to a more Open Access way.

Does anyone out there know if there are good Open Access publishing services that would enable one to do this? Any information about possible publishers, costs associated with doing this, etc would be helpful. Thanks in advance.

NOTE ADDED: Perhaps most importantly - we are looking for systems that would include the possibility of publishing printed versions of the proceedings ...

Important new rules for NIH grant submissions ..

Just got this and thought it would be of interest to many people ...

Dear NIH principal investigators, signing officials, and applicants,

Are you planning to submit an NIH grant application? If so, please note that all applications intended for due dates on or after January 25, 2010* require the use of new forms and instructions. Major changes include:

· Restructured forms to align with review criteria

· Significantly shorter page limits

These changes apply to all competing applications, so whether you are submitting a new, renewal, resubmission or revision, you must take action now to ensure a successful submission!

1. Return to the updated funding opportunity announcement or reissued parent announcement to download the new application package and instructions.

– FOAs are in the process of being updated. See timeline for more information.

2. Be sure to choose the correct forms. Applications intended for due dates on or after January 25 require new forms.

– For Electronic SF 424 (R&R): ADOBE-FORMS-B

– For Paper PHS 398: Revision date “June 2009”

3. Read the updated FOA and new application instructions carefully

For more details the Enhancing Peer Review Web site which has a page dedicated to the upcoming application changes, as well as a number of additional resources including:

· A short video overview of the changes

· FAQs

· List of related policy notices

· A Training and Communications Resources page, and more.

Sincerely,

NIH Office of Extramural Research

Division of Communications and Outreach

* Applicants eligible for continuous submission who are submitting R01, R21, and R34 AIDS applications should use the old SF 424 (R&R) ADOBE-FORMS-A on or before February 7, 2010 and the new SF 424 (R&R) ADOBE-FORMS-B thereafter. Non-AIDS applications from applicants eligible for continuous submission need to us ADOBE-FORMS-A on or before January 24, and the ADOBE-FORMS-B on or after January 25, 2010.

Thursday, December 17, 2009

U. of California seeking proposals on UC-Industry collaborations

Just got this email that might be of interest to some:

The University of California Office of Research and Graduate Studies is pleased to announce the spring 2010 UC Discovery Grant Request for Proposals.

The University of California Discovery Grant opportunity (UCDG) promotes collaborations between UC researchers and industry partners in the interest of supporting UC researchers and trainees, strengthening the state’s economy, and serving the public good. The UCDG is a matching grant mechanism; research projects are jointly funded by a UC Discovery Grant and a required industry matching contribution.

All applicants must submit a Notice/Letter of Intent (LOI) between January 11-February 12, 2010. Full proposals are due on March 2.LOIs and proposals must be submitted using the online proposal system proposalCENTRAL https://proposalcentral.altum.com/. Please refer to the program website for the most up-to-date information: http://www.ucop.edu/ucdiscovery/ . Detailed LOI and Application submission instructions will be available at the website above and on proposalCENTRAL the beginning of January.

Please circulate this announcement widely.

Monday, December 14, 2009

So cool - CoPI/colleague of mine Jessica Green picked for TED2010

I am so incredibly psyched that my colleague, collaborator and friend Jessica Green was picked for the TED2010 conference. See the press release here.

Jessica is a Microbial Ecologist at U. Orgeon and has a diverse background in engineering, biology, physics and other things.

And she is both brilliant and cool. They could not have picked better. Way to go Jessica.

Want to know more about her work. Watch this video:

Saturday, December 12, 2009

Nice Darwin Art at #UCDavis Evolution/Ecology Dept.

For more on this see The Face of Darwin where K. Garvey explains the history of the mural in more detail.

Friday, December 11, 2009

Great call for more openness in biology discussions by Steven Wiley in the Scientist

An article after my own heart ... Steven Wiley has written a column in the Scientist (Speak Your Mind :The Scientist [2009-12-01]) that speaks both to me and for me. In it he discusses the need for biologists to be more public about their opinions about their work and that of others.

He says, for example

Recently, I attended a conference on biofuel development that included a discussion of the feasibility of deriving fuels from algae. In the open meeting, only a few biologists voiced an opinion, all stated very politely. In private, however, the opinions that I heard were invariably strong and contentious, and few people agreed with what appeared to be the general consensus. It seemed that most of the meeting participants were unwilling to let their viewpoints be publicly known.

I have witnessed the exact same phenomenon and find it disheartening. To help build science and biology we need to be more open about discussing ideas. This pattern of whispering behind the scenes or standing behind anonymity drives me a bit crazy and it is one of the reasons I have become a science blogger and tweeter and such.

Wiley wraps up his discussion by saying

However, a comment is only really useful when the author is identified, because it allows you to evaluate its credibility. Besides, why should anyone respect an opinion that even the author is not willing to claim? And being honest does not mean being insulting or nasty. Open and honest debate has always been necessary for the best science, but mutual respect between the participants is necessary to make it work.

I agree with this too. I have slipped occasionally in being too nasty in comments but am trying to get that under control. But overall, the importance of openness far outweighs the risk of sometimes being offensive. So I am calling for others in biology - start a blog - start tweeting - ask more questions at meetings - get up and say you what you think - sign your name to reviews - sign your name to comments on the web - be more open. It will be good for all of us.

Monday, December 07, 2009

Amazing post-doc fellowship opportunity: Center for population biology at #UCDavis

No bias here --- but this really is an incredible post doc opportunity in population biology here at U. C. Davis. See below:

EFFECTIVE: December 7, 2009

DEADLINE: January 20, 2010

POSTDOCTORAL FELLOW IN POPULATION BIOLOGY--The Center for Population Biology at UC Davis invites applications for a Postdoctoral Fellowship in Population Biology, broadly defined to include ecology, phylogenetics, comparative biology, population genetics, and evolution. We particularly encourage applications from candidates that have recently completed, or will soon complete, their PhD. The position is for TWO YEARS, subject to review after one year, and can begin as early as 1 July 2010. It has an annual salary of $38,000 plus benefits, and $6,000 per annum in research support. The Fellow will be a fully participating member in the Center for Population Biology and will be expected to have an independent research program that bridges the interests of two or more CPB research groups. We strongly encourage candidates to contact appropriate faculty sponsors before applying. We also ask that each Fellow teach a multi-day workshop, discussion or lecture series that is of broad interest to the community of population biologists at UC Davis; faculty sponsors or the Director of CPB, Jay Stachowicz, can provide additional input on this aspect of the fellowship. For samples of past workshop abstracts and more information about UC Davis programs in population biology, see http://cpb.ucdavis.edu/jobs.htm.

ONLINE APPLICATION: Interested candidates should submit a cover letter, CV, a short (1-2 page) description of research accomplishments, a short (1-2

page) description of proposed research including potential faculty mentors, a brief description of their proposed workshop/minicourse, and copies of two publications at http://www2.eve.ucdavis.edu/jobs/ all as PDFs. We require 3 letters of recommendation. The referees you list in the online application will receive an automatic notification from our system instructing them how to directly upload letters to our website. Refer to the on-line instructions for further information. For full consideration, applications should be received by January 20. 2010. The University of California is an affirmative action/equal opportunity employer with a strong institutional commitment to the development of a climate that supports equality of opportunity and respect for differences. E-mail questions to gradcoordinator@ucdavis.edu.