Friday, April 30, 2010

Quick one here: nice pic of "Earth From Mars" from NASA

Just a wee bit humbling here. H/T to Andy Fell.

New hotel at #UCDavis is quite nice

Here is a little video I made when I went to pick up Rebecca Skloot for her talk at UC Davis last week. She was staying at the new UC Davis hotel University of California Davis Hotel | Hyatt Place which is on campus right near the Mondavi Center. The hotel seemed very very nice.

Just thought I would post the vid -- seems like it is a good addition to the area hotels with the one exception that it is not in downtown.

Protect that biodiversity - wear protection

Here are some pics of some biodiversity protecting devices seen around here ..

Friday, April 23, 2010

47170 papers referring to #HeLa in PubMed Central (re talk today by @rebeccaskloot )

Wow - did a search of PubMed Central for the keyword HeLa because Rebecca Skloot is here today to talk about her book "The immortal Life of Henrietta Lacks" which is about the woman behind the HeLa cells.

And there are a wopping 47170 papers in PMC that come up. I am sure some of these will be not about HeLa cells but most seem to be.

Here is a link to the results: hela - PMC Results

That's pretty incredible - 47000 or so papers, freely available, all with some reference to this one human cell line.

Tuesday, April 20, 2010

Yuck - rRNA databases restrictions on sharing/reuse create major complications

Well, this is annoying. I started to look at the sharing policies for data from various ribosomal RNA databases. And boy was I surprised. One database, the RDP, run out of Michigan State University has a page with information about their policies. The page says the following:

By downloading data ("Data") from the Ribosomal Database Project ("RDP"), you agree as follows:
Data are copyrighted by the Michigan State University Board of Trustees. 
You may use the Data for your own non-commercial research purposes, and may make derivatives from the Data for your own non-commercial research purposes. All other rights are reserved by Michigan State University. 
You may not sell the Data or any derivatives you prepare from the Data, nor may you provide the Data or derivatives you prepare to any third party for commercial purposes. 
MSU makes no warranty, express or implied, to you or to any other person or entity, including without limitation the implied warranties of merchantability or fitness for a particular purpose of the data. MSU will not be liable for special, incidental, consequential, indirect or other similar damages, even if MSU or its employees have been advised of the possibility of such damages. 
You will only distribute the Data or derivatives with copyright notices. 
If you publish from the Data, you will acknowledge the contribution of Michigan State University and the Ribosomal Database Project 
This cannot be right? Most of the data came from Genbank so certainly they cannot Copyright it. Now it may be that they are referring to sequence alignments and other derivatives of the raw data but this implies that all the data in the RDP is Copyrighted.

Mind you, I do not like the policy even if it is just for the derivatives of the data (e.g., alignments) since this will certainly make some things very difficult in terms of publishing.  For example, if I use an alignment from RDP, how do I provide the alignment when I publish a paper? If I provide it, do I only provide it to non commercial entities? Does that mean, in essence, commercial entities would not be allowed to see alignment figures?

I get that people do not want people to download and then redisplay all of their content, thereby in essence possibly killing the original database.  But Copyrighting all the data in the database?  Even data that is not theirs?  Is this just a scare tactic of some sort?  A mistake? I cannot tell.  There must be better ways to prevent someone from redisplaying the entire database structure and content without such severe tactics. 

So - is this an issue just with RDP? Turns out - no. The SILVA database in Europe has some restrictions too:

The SILVA databases and services/tools offered at are FREE FOR ACADEMIC USE. All downloads can be used, modified and redistributed within the academic environment without any limitations.

Users from NON-ACADEMIC/COMMERCIAL ENVIRONMENTS can also directly access all downloads including the results of the SILVA Webaligner (SINA) but only for limited/temporary use (only for test purposes).

If you are interested in unlimited usage of the SILVA databases/services or parts of them within a non-academic/commercial environment, please send an e-mail to ....
Though thankfully they do not seem to be trying to Copyright or reserve rights for other people's data. They simply refer to downloads from their database and what one can do with such downloads. They never say they own in any way the data itself.

Fortunately greengenes seems to have no restrictions on the use of data or anything downloaded from there.  Though I am still looking into this.

I think it is time for rRNA researchers to think carefully about using data/alignments/etc from databases like Silva and RDP.  If one uses an alignment from one of these databases it is possible one would be violating the DB policies if one released the alignment as part of a paper.  Yet, if one uses the alignment in the paper, one should release it.  So seems better to seek out and used fully open datasets and alignments and other results.

Below are some discussions relating to some tweets I posted about this issue yesterday:

Sunday, April 18, 2010

After 20 yrs of email my first "reply all" SNAFU; reply all apology etiquette?

Oh well. I had a good run. I have been really careful over the years with avoiding the "reply all by accident" mistake that has created so much comedy and pain to others. I am not sure how I have avoided it so well - in part I am careful - but in part clearly just lucky. And I have witnessed some pretty pretty funny reply all mistakes as I am sure have most people out there. The funniest was one by well known evolutionary biologist who made a bit of a faux pas in replying to a message sent to the evoldir mailing list in the early 1990s. After witnessing what happened in particular with that one, I did become more careful with replying.

So then yesterday, while at Stanford for an Evolutionary Genomics Symposium, I was tweeting and friendfeeding the talks (will post notes in a little while) and I was doing this all on my iPhone. I like the iPhone, but typing notes on it, and looking up Urls of papers, and copying and pasting things to Friendfeed or twitter, is not that easy. But I was trying to keep up. And I logged in to my gmail to look up the schedule, and saw an email message from Jim Bristow the Deputy Director of the Joint Genome Institute that I had meant to respond to earlier.

In his email Jim requested, quite reasonably, that people that have projects that are done in collaboration with (or entirely by) the Joint Genome Institute, add a little statement to their Acknowledgements in their publications, regarding the funding for the JGI.

We had had a paper come out Friday supported by DOE and done in part at the JGI, and I looked at the paper on the PLoS One site and we had the right acknowledgement in our paper, thanks to a suggestion from David Gilbert who handles Public Affairs at the JGI.

So I wrote a little email response to Bristow's email. My email was brief:
I think we did it right here
Just a link to the paper and a suck up statement telling Jim that I think we did the Acknowledgement the way he wanted it.

All sounds pretty boring right? That was until I clicked send, which sent the email, somehow, to all the people on the DOE mailing list - basically all people that have ever worked with JGI. And if you reread my message, it sounds like I am bragging "did it right" about how great our paper was.

I have already gotten three responses, all thinking I meant that our paper was "right".

So here is my question for everyone. What do I do now? Do I send another message to all saying "Oops - I did not mean to send that to everyone and that I was not bragging about our paper? Or do I lie low and let it blow over?

Thursday, April 15, 2010

Experiments in scientific sharing contd: Biotorrents

Yesterday a paper from my lab (by Morgan Langille, with me as co-author) was published in PLoS On: BioTorrents: A File Sharing Service for Scientific Data

In it we describe a new website dedicated to the sharing of biology related files via BitTorrent, the popular distributed file sharing system. The abstract sums things up prety well:
The transfer of scientific data has emerged as a significant challenge, as datasets continue to grow in size and demand for open access sharing increases. Current methods for file transfer do not scale well for large files and can cause long transfer times. In this study we present BioTorrents, a website that allows open access sharing of scientific data and uses the popular BitTorrent peer-to-peer file sharing technology. BioTorrents allows files to be transferred rapidly due to the sharing of bandwidth across multiple institutions and provides more reliable file transfers due to the built-in error checking of the file sharing technology. BioTorrents contains multiple features, including keyword searching, category browsing, RSS feeds, torrent comments, and a discussion forum. BioTorrents is available at
Personally, I am not sure if Biotorrents is going to end up being used extensively. I hope so. I think it is a great idea of Morgan's. But more importantly, I believe it represents something we need more and more of in the "Open Science" movement. We need experimentation with all sorts of methods for improving sharing. The sharing of large electronic files, such as datasets of some kind (e.g., sequences, pictures, mass spec results, etc) are rapidly becoming a major complication in scientific research. If one publishes a paper on whatever, or even before one publishes a paper, sharing the data associated with the work is not always simple. Biotorrents could help in this in that sharing files via BitTorrent is very simple and easy. And if some data sets are of great interest, and if a lot of people start using Biotorrents, then the download and distribution of the data sets of interest will get faster as more and more people serve as hosts to contribute to the distributed file sharing.

If you want to learn more about Biotorrent, the best place to go is to Morgan's blog "Beta Science". In particular you should read "An interview with the creator of Biotorrents" where he interviews himself.

Also, Janet Fang of Nature News has just written a brief post on Biotorrents: "Biotorrent aims to open data sharing floodgates" where they quote me and Morgan. I particularly like the ending:
“Someone could download all the Nature papers and post them there, but we’re not encouraging that,” Eisen jokes. All PLoS papers are already on BioTorrents.
More on the web is coming out regarding Biotorrents and I will try to post some links here, including to some slightly older stuff

Some links:
Freindfeed Search for Biotorrents

Older discussion on FriendFeed by Morgan et al.

Tuesday, April 13, 2010

Davis, CA schools and their class size issues

Cross posting here from my "normal" non science life as a parent and resident of Davis, CA. This video was made by Hal And Carin Sloane, who are both neighbors and friends of mine. It is part of a fundraising effort for the Davis Schools Foundation and a general awareness raising campaign about the effects of budget cuts on K-12 education.

Davis Schools Foundation "Class Size" from Hal Sloane on Vimeo.

Things to do to wish "Happy Birthday" to my brother, Michael Eisen

Just to make sure people know I am posting here happy birthday wishes to my brother, Michael Eisen. If you want to get him something, I suggest doing one/all of the following things
If people have any other suggestions about what to do in honor of Dr. Michael Eisen, please post ...

Friday, April 09, 2010

Coming to UCDavis, 4/23, @RebeccaSkloot discussing #HeLa book "Immortal Life of Henrietta Lacks"

Well, the date is fast approaching and I want to at least get people in the Davis area ready. Author/journalist/blogger Rebecca Skloot will be speaking at UC Davis about her new book "The Immortal Life of Henrietta Lacks." I know from personal experience, the book is simply amazing. However, don't trust me, trust the world out there, the reviews and press she has been getting are unlike any for really any book recently including:
The Colbert ReportMon - Thurs 11:30pm / 10:30c
Rebecca Skloot
Colbert Report Full EpisodesPolitical HumorFox News
And somewhat amazingly, she will be coming to Davis (I got her to commit to coming when I read a preprint of the book - before she became such a big shot) The details are below - I will be posting some more over the next few weeks about the book but just wanted to get the word out.
  • Title: “The Immortal Life of Henrietta Lacks (aka HeLa): The History and Ethics of Research on Human Biological Materials”
  • Date: 4/23/2010
  • Time: 4:00-6:00 p.m.
  • Open to the public with talk plus questions plus book signing. Books will be available to purchase.
  • Location: ARC Ballroom
  • Sponsor: UC Davis Genome Center, Science and Technology Studies Program, University Writing Program, and Davis Humanities Institute.
  • Contact: Jonathan Eisen

Tuesday, April 06, 2010

Most important paper ever in microbiology? Woese & Fox, 1977, discovery of archaea

Well, today in my "Microbial phylogenomics" class at UC Davis we are discussing what I think might be the most important paper (well, actually, series of papers) in the history of microbiology. The papers are the ones where Carl Woese, George Fox and colleagues outline the evidence for the existence of a "hidden" third major branch in the tree of life - what is now known as the archaea. The evidence for this third branch was first laid out in a series of papers in 1977 including:
Now Woese, Fox and others in Woese's group had been leading up to these publications in ways for years (I note, there were some pretty incredible people involved in these studies in the years before 1977 too including Mitch Sogin, now at MBL, David Stahl, Chuck Kurland, Norm Pace, etc but that is another story). They had been determining the nucletide sequences of small fragments of rRNAs from different species, especially from different organisms that did not have nuclei - the so-called "prokaryotes". And they were using these sequences to infer the phylogenetic relationships among these microbes.

Consider for example, the paper by SJ Sogin et al in 1972 "Phylogenetic measurement in procaryotes by primary structural characterization. Sogin SJ, Sogin ML, Woese CR. J Mol Evol. 1971;1(1):173-84. This paper laid out some of the arguments for why rRNA sequence information might re-write our concepts of classification of prokaryotes. From this and many of the other papers from Woese and Fox and others before 1977 it had been shown that one could use rRNA sequence information to more accurately infer relationships among "prokaryotes" than had been done previously with other types of information. Today this notion that we can use sequence information to infer the evolutionary history of microbes is taken for granted but back in the early 1970s it was not. And in addition, many people probably just did not care too much about the exact details of microbial phylogeny and classification.

But this changed in the 1977 with that series of papers outlined above. What these papers showed was that hidden beneath everyone's noses was a separate, previously unknown, major split in the prokaryotes into two distinct lineages. One of these included all the standard bacteria people were familiar with like E. coli and B. subtilis and one of them included some pretty weird wacked out bugs that thrived in extreme conditions. For example, look at the phylogenetic tree from Fox et al.

This tree (made using a distance based clustering algorithm where the distances represent a measure of the similarity of the catalog of short ologonucleotides found in each species) shows the normal bacteria on one side (down below) and methanogens and their relatives on another side. I like the last line of the abstract, which to an evolutionary microbiologist can be considered equivalent to Watson and Crick's "It has not escaped our notice ...". Here Fox et al. say "These organisms appear to be distantly related to typical bacteria"

The Bach et al. paper has similarly interesting, cool nuggets. However, alas, it is not available in PubMed Central as are the other two papers here I am not focusing on it. What is great though is that the other two papers are freely available to anyone to read in Pubmed Central and also at the PNAS web site. Yay for access. Too bad the other paper is not freely available.

Anyway, fortunately, the most critical of these papers is the Woese and Fox paper from PNAS which is freely available And it is in this paper that they full argument is laid out. Consider the abstract:
ABSTRACT A phylogenetic analysis based upon ribosomal RNA sequence characterization reveals that living sys.tems represent one of three aboriginal lines of descent: (i) the eubacteria, comprising all typical bacteria; (ii) the archaebacteria, containing methanogenic bacteria; and (iii) the urkaryotes, now represented in the cytoplasmic component of eukaryotic cells.
In this paper they lay out the evidence for the existence of at least three main branches in the Tree of Life. Interestingly, for the phylogenetically minded people out there, they do not show an evolutionary tree in the paper. What they show is what is known as a similarity matrix (the inverse in essence of the distance matrices many people may be used to seeing) where a score is given for the similarity between organisms in the fingerprints of their 16S/18S rRNAs).

If one scans through the matrix one can clearly see three clusters of similarity scores

From this table, Woese and Fox infer the existence of three primary branches in the tree of life. This is laid out in a few paragraphs starting with one at the bottom of page 5088.
A comparative analysis of these data, summarized in Table 1, shows that the organisms clearly cluster into several primary kingdoms. The first of these contains all of the typical bacteria so far characterized .... (lots of names here) ... It is appropriate to call this urkingdom the eubacteria.
And then a second paragraph discusses the second group
A second group is defined by the 18S rRNAs of the eukaryotic cytoplasm-animal, plant, fungal, and slime mold (unpublished data). ... (They call this lineage the urkaryotes).
And then the third paragraph lays out the revolution:
Eubacteria and urkaryotes correspond approximately to the conventional categories "prokaryote" and "eukaryote" when they are used in a phylogenetic sense. However, they do not constitute a dichotomy; they do not collectively exhaust the class of living systems. There exists a third kingdom which, to date, is represented solely by the methanogenic bacteria, a relatively unknown class of anaerobes that possess a unique metabolism based on the reduction of carbon dioxide to methane (19-21). These "bacteria" appear to be no more related to typical bacteria than they are to eukaryotic cytoplasms. Although the two divisions of this kingdom appear as remote from one another as blue-green algae are from other eubacteria, they nevertheless correspond to the same biochemical phenotype. The apparent antiquity of the methanogenic phenotype plus the fact that it seems well suited to the type of environment presumed to exist on earth 3-4 billion years ago lead us tentatively to name this urkingdom the archaebacteria. Whether or not other biochemically distinct phenotypes exist in this kingdom is clearly an important question upon which may turn our concept of the nature and ancestry of the first prokaryotes.
Mind you, the whole paper is worth reading, but those three paragraphs lay out a revolution in how one thinks about the tree of life. Now admittedly, some of our notions of the tree of life have changed since 1977 and there is much more of a feeling of mixing and merging between branches than was appreciated back then. And some definitely feel that the archaebacteria (or archaea as they are known today) are not per se a third branch in the tree of life but rather than there are four or five major branches and that archaea may not in fact be a "monophyletic grouping". But whether you think archaea truly represent a third branch in the tree of life or not, this paper fundamentally altered how we think about the tree and about microbes. The work was even written up in the New York Times and got a lot of press (not that that is proof of anything - but it got microbial phylogeny into the public's mind).

I think it is worth having all biology students read and understand this paper. Which is why I now try to cover it in basically all classes whenever I can. I could go on and on, but I will simply end with their last paragraph:
With the identification and characterization of the urkingdoms we are for the first time beginning to see the overall phylogenetic structure of the living world. It is not structured in a bipartite way along the lines of the organizationally dissimilar prokaryote and eukaryote. Rather, it is (at least) tripartite, comprising (i) the typical bacteria, (ii) the line of descent manifested in eukaryotic cytoplasms, and (iii) a little explored grouping, represented so far only by methanogenic bacteria.

Woese CR, & Fox GE (1977). Phylogenetic structure of the prokaryotic domain: the primary kingdoms. Proceedings of the National Academy of Sciences of the United States of America, 74 (11), 5088-90 PMID: 270744

Fox GE, Magrum LJ, Balch WE, Wolfe RS, & Woese CR (1977). Classification of methanogenic bacteria by 16S ribosomal RNA characterization. Proceedings of the National Academy of Sciences of the United States of America, 74 (10), 4537-4541 PMID: 16592452

Balch WE, Magrum LJ, Fox GE, Wolfe RS, & Woese CR (1977). An ancient divergence among the bacteria. Journal of molecular evolution, 9 (4), 305-11 PMID: 408502

Some related posts

Friday, April 02, 2010

Evolution & Genomics symposium; Stanford; 4/16-4/17; open to all; gonna be good

Now this is gonna be good.  Stanford. April 16-17, 2010.  Evolution and Genomics Symposium including:

Prof. Andrew Clark, Cornell University delivers the David Starr Jordan Memorial Lecture, titled "How is the human population explosion affecting the genetics of complex disease?"

Prof. Johanna Schmitt, Brown University delivers the John Thomas Memorial Lecture, titled "Evolutionary genomics of plant responses to climate change."

And many other talks including:
  • Graham Coop, U.C. Davis, "Geographic patterns of adaptation in humans" 
  • Jonathan Eisen, U.C. Davis, "A phylogeny-driven genomic encyclopedia of bacteria and archaea" 
  • Hunter Fraser, Stanford, "Adaptive evolution of gene expression" 
  • Jessica Green, U. Oregon, "Biodiversity theory and metagenomics-based biogeography" 
  • Stephen Palumbi, Stanford, "Genomics of speciation and adaptation in the sea" 
  • Brian Simison, California Academy of Sciences, "Molluscan mitochondrial genomics" 
  • Jay Storz, U. Nebraska, "Genomics of high-altitude adaptation in vertebrates" 
  • Ward Watt, Stanford, "Evolutionary functional genomics of ecologically accessible species" 

Gonna be fun to go back to my old haunting grounds (did that little PhD thing there).