The Tree of Life: Some quick comments on "Giant viruses coexisted w/ the cellular ancestors & represent a distinct supergroup"

Thursday, September 13, 2012

Some quick comments on "Giant viruses coexisted w/ the cellular ancestors & represent a distinct supergroup"

Got asked on Twitter about this paper:

BMC Evolutionary Biology | Abstract | Giant viruses coexisted with the cellular ancestors and represent a distinct supergroup along with superkingdoms Archaea, Bacteria and Eukarya

Hmm... @carlzimmer @phylogenomics? Weigh in?RT @diana_yates_: Study of giant viruses shakes up the tree of life:bit.ly/NqLX2u
— Ed Yong(@edyong209) September 13, 2012

I answered briefly

@edyong209 @carlzimmer @diana_yates_ very unconvincing; no taxa in trees; no evidence FSF useful marker; Fig 3 strange; data not released ..
— Jonathan Eisen (@phylogenomics) September 13, 2012

Don't have time for a detailed blog post but here are some quick comments:

1. Giant viruses are fascinating and cool

2. I have done work connected to the topic of this paper and thus might not be considered fully objective. For example see

3. I see no evidence that the type of analysis that they do on protein folds is a robust phylogenetic method. Phylogeny from sequence alignments (which is what we focus on in my lab) have been tested and tweaked for some 50 years. There are 100s to maybe 1000s of papers on methods alone - not to mention the 1000s of papers using alignments for phylogenetics. I am not convinced that the analysis being done here of FFs and FSFs is particularly robust. It seems interesting, certainly. But is it sound? I mean, I could build phylogenetic trees from cell size, from shape, from eye color, and from all sorts of other features. Those would all suck for certain. Protein folds - not sure about them. They almost certainly are prone to convergent evolution and I do not see any attempt in this analysis to deal with that issue.

4. The authors of the current paper do not show any taxa names on their trees - just colors for large groups of taxa (bacteria, archaea, eukaryotes and viruses). It is really not good practice to remove the taxon names. If they were there the first thing I would do is to look at the patterns within the groups they highlight. Do all the major phyla / kingdoms of eukaryotes, for example, come out looking as one would expect based upon other studies. Or are they all over the place? Same for bacteria and archaea. Not including taxa makes it nearly impossible to judge this paper positively. I could not find this information in supplemental data either.

5. They really should have released the data tables they used for the phylogenetic analysis. Don't know why they did not.

6. In Figure 3 with the rooting they have, either viruses are a subgroup of archaea or archaea are not monophyletic. Not a good thing in a paper trying to claim viruses represent a fourth grouping on the tree of life.

Anyway - got to do some other things but just wanted to get some comments out there.

UPDATE 9/19 - some prior stories about the "fourth domain" and ancient viruses - to counter notion in the press release for this paper that their findings "shake up the tree of life". Even if their specific inferences about viral evolution are correct, such inferences / conclusions have been made before.

41 comments:

Unknown9/13/2012 6:04 PM
Hey. Thank you for reading the paper and responding. Don't understand why you'd publish something so quickly without even interviewing the researcher, but I'll ask him to respond.

D. Yates.
ReplyDelete
Replies
Christopher Hogue9/13/2012 6:40 PM
I have gone through all of the FSF, FF papers of Caetano-Annoles and would suggest that his method is interesting, but there are a few cases where the clock assumption falls short. For example disordered conserved ribosomal proteins like L15 cannot duplicate for re-use as can a folded cold-shock domain. Not without a duplication in its binding partner. So his dating of ribosomal proteins seems too wide, and gets to a proteins before ribosome answer. I agree, no data for any of his papers is available, impossible to replicate without a rebuild of his infrastructure. I think it's an interesting method but lacks "error bars".
ReplyDelete
Replies
gca9/13/2012 8:40 PM
Jonathan, Thank you for your candid assessment. Diana prompted a response so I am now finding my way to your blog, something I never do. We have been working on these kinds of approaches since 2000 and publishing our results since 2003. A focus on structure for phylogenetic analysis is indeed incipient but our methods are cladistic, very traditional, and their inception predates modern analysis of sequence. I agree that there is much to learn from molecular structure but disagree that folds are like cells or eye color. Cells and eye color express a multitude of traits at many levels of organization, and it would be premature to use those traits in global analyses of these kinds. However, folds have been studied and catalogued since Kendrew and bioinformatics suites that are used to make structural inferences are robust and advanced. Structural biology remains a strong field and structural genomics has expanded our horizons of the molecular world. I must admit however that our understanding of structure and disorder is limited but the field is advancing.

So back to your comments about how sound is the methodology. It has a number of advantages over sequence. For starters, it does not violate character independence as much as sequence does. Sites in sequences by definition interact with each other as they establish secondary, supersecondary, tertiary, and quaternary molecular structure. The incorporation of this fact into models of sequence evolution is a grand challenge. In contrast, domain definitions can be in some cases quite precise and their cataloguing robust. So the structural census provides characters and taxa directly. This obviates the need of alignment, which is a second grand limitation of sequence analysis (as rightly pointed out by Morrison). If interested, you may find other many benefits of the methodology (phylogenetic inapplicables, taxon sampling, tree imbalance, domain rearrangements, etc) at http://www.frontiersin.org/Bioinformatics_and_Computational_Biology/10.3389/fgene.2012.00172/full. I think that the most important feature of domain structures is their high evolutionary conservation. This makes them unique for the deep exploration of phylogenetic relationships. In contrast, sequences change at very dynamic pace and that is exactly their power. The trade-off is that they are limited in their ability to provide big pictures. Sober and Steel had very sound arguments about the informational (and philosophical) limitations of sequences that are worthy of careful analysis. (Continues in separate comment because of html character limitations) -Gustavo

ReplyDelete
Replies
gca9/13/2012 8:41 PM
(continued) You also complain about not showing taxa names or making public the data. Trees are too big to make labels explicit and our intention in this paper is to show the global placement of viruses rather than distract with details. Trees of proteomes provide acceptable grouping that are not far away from those of other phylogenomic methods, as we have explicitly described in several papers in the past (since 2006). Russell Doolittle in 2005 also showed that folds provide acceptable trees of life, so it is not only my laboratory that is producing trees of proteomes, though we are one of few that are providing trees of domains. In terms of our data, which is massive, we are in the process of updating the MANET database. We will provide data matrices, trees, search functions, functional annotations and much more. Unfortunately we do not have funding for the endeavor so we are bootstrapping with what we can and the effort is progressing slowly.

Finally, the rooting of the cellular world in Archaea that you note and that follows the early rise of giant viruses is for us remarkable. It has been consistently recovered in all our trees, regardless of genomic dataset, structural classification (SCOP, CATH), phylogenetic character (from folds to families, from high levels of gene ontology definitions to the lowest that are possible to handle), a focus on abundance or occurrence, and many other twists. It appears at odds with the canonical rooting but I caution that the issue of the rooting of the tree of life is complex and far from resolved. We simply provide the structural view, which now needs to be reconciled with sequence views.

I close by saying that a focus on the phylogenomic analysis of molecular features other than sequence should be a welcome additional to the bioinformatics toolkit, even if the concept may be unfamiliar. I think structure can positively complement evolutionary inferences derived from sequence, especially since the cladistics methods we use to analyze multistate taxa have been also repeatedly tweaked and discussed for decades. -Gustavo
ReplyDelete
Replies
Unknown9/14/2012 7:27 AM
Dear Jonathan,

Thank you for engaging with the material. It takes time and effort to understand new approaches and I appreciate that you are doing so! I sincerely hope the conversation (between you and GCA) will continue. I think you will find it most fruitful.

Best regards
ReplyDelete
Replies
Claudiu Bandea9/18/2012 12:35 PM
It is great to see that Gustavo and his colleagues (Nasir et al.) have extended their approach of inferring phylogenies (based on protein domain structures) from the cellular kingdoms, Archaea, Bacteria, and Eukarya, to viruses.

There are only two broad ways of thinking about the origin and evolution of viruses: they evolved from simple to complex, by increasing the size of their genome and the complexity of their proteome, or from complex to simple (reviewed here: http://precedings.nature.com/documents/3886/version/1).

The current prevalent view is that the viral lineages originated from simple genetic elements, before the origin of cells. According to this hypothesis, the mysterious ancestral viral elements evolved by acquiring new genetic material (including genes for components of translation machinery) into complex viruses whose genomes are several times larger than the genome of many symbiotic or parasitic cellular species.

On the contrary, the fusion model proposes that the viral lineages originated from parasitic or symbiotic cellular species that, in order have full access to the host resources, including translation machinery, fused with their host cell, by a process in which their cellular membrane fused with that of their host. After synthesizing their specific molecules and replicating their genome within the host cytoplasm, these organisms regain a cellular organization and continued their development. This novel type of life cycle opened unique evolutionary opportunities for both viruses and their host cells.

Many extant viruses, including poxviruses and mimiviruses, start their life cycle by fusing with their host cells,which provids compelling evidence for the fusion model.

One of the most remarkable implications of fusion model is that new viral lineages originated from diverse Archaea, Bacteria and Eukarya species though out their life history, and that this process might still be active. Surprisingly, it appears that several parasitic cellular species are indeed evolving into new viral lineages.

The data from Nasir et al. paper indicate that viruses have evolved by reductive evolution. This data represents a strong additional line of evidence supporting the fusion model and the hypothesis that the ancestors of viruses were cellular species.
ReplyDelete
Replies
Unknown1/08/2013 6:29 PM
To start with I am not going to comment on the phylogenetics algorithms, or the way they are employed or the conclusions that Gustavo draws, however I will say something about the underlying data and use of domains as molecular characters; on balance they are both more robust than others (1), but also potentially less informative (2).
ReplyDelete
Replies
Unknown1/08/2013 6:40 PM
This comment has been removed by the author.
ReplyDelete
Replies
Unknown1/11/2013 2:56 AM
I am not angry or frustrated, I am trying to shed light where there is darkness. My approach, or the wisdom of it has not been described here and was never under discussion. You have confused my approach with that used in the Caetano-Anolles paper in this discussion; I am sorry if I confused things by also mentioning my own work in passing at the end. I have said several times our tree is not based on superfamily characters.

My initial comments were all regarding aspects of superfamily domains, within the context of the Caetano-Anolles paper; there was clearly a lack of understanding about superfamily domains and I was trying to be informative as I have a great deal of expertise on the subject to share. Neither of us have done a very good job of sticking to that point.
ReplyDelete
Replies
Claudiu Bandea1/12/2013 10:38 AM
I must say that I followed the exchanges between Jonathan, our host, and Julian with much interest. The reason is that unlike conventional publications, here in the Blogosphere it is difficult to hide from inconvenient questions or issues.

The focus of the discussion was on the merit of using protein domains (e.g. superfamily) vs sequences as molecular characters for determining phylogenetic relationships. Although this discussion developed in a post about the evolution of viruses, specifically, on a paper by Nasir et al. entitled “Giant viruses coexisted with the cellular ancestors and represent a distinct supergroup along with superkingdoms Archaea, Bacteria and Eukarya,” unfortunately, it did not crystalized on this subject. Possibly, that’s because without clarifying the robustness of the methods used for inferring phylogenies, both Julian and Jonathan felt that it was not worth pursuing a productive discussion of the paper and its subject; or, possibly, they inadvertently got distracted and distanced themselves from such a discussion.

My goal here is to resurrect this discussion, and hopefully make progress in understanding the origin and evolution of viruses. And, I’ll try to do that by bringing forward some biological and evolutionary principles that might direct, or at least help with the interpretation of phylogenetic data, whether this data is based on proteins domains or sequence characters.

As I mentioned in a previous comment (please see above), fortunately, there are only 2 broad ways of thinking about the origin and evolution of viruses: they evolved from simple to complex by increasing the size of their genome and the complexity of their proteome, or from complex to simple (reviewed here: http://precedings.nature.com/documents/3886/version/1).

Although, it is clear that the extant viruses and their recent viral ancestors have occasionally acquired new genetic material, most of the data, including that produce by Nasir et al., supports the paradigm that overall, the current viral lineages are evolving by reductive evolution. Probably, the most convincing argument for the reductive evolution of viral lineages is the evolutionary pattern of the thousands of extant intracellular parasitic or symbiotic cellular lineages, which to my knowledge have been evolving, without exception, by reductive evolution.

If that is the case, than the question is: why would parasitic or symbiotic viral lineages evolve any other way?
ReplyDelete
Replies