Wednesday, December 19, 2012

Great ideas in this #PLoSOne paper; except we published same idea 12 fu$*# years ago

Figure 1 from Pollock et al. 2000
Wow. I mean, imitation is a form of flattery. But this paper ... Grrrrrrr PLOS ONE: Conveniently Pre-Tagged and Pre-Packaged: Extended Molecular Identification and Metagenomics Using Complete Metazoan Mitochondrial Genomes In the paper the authors basically argue that for many purposes, including phylogenetic studies in particular, one could obtain many mitochondrial genomes at once by just pooling together samples from different organisms, shotgun sequencing the samples, and assembling the separate mitochondrial genomes out.  All one would need to do is to make sure the organisms pooled were distantly related enough such that their mitochondrial sequences would not cross assemble with each other.  They say things like:
We propose a novel approach for the isolation and sequencing of a universal, useful and popular marker across distant, non-model metazoans: the complete mitochondrial genome. It relies on the properties of metazoan mitogenomes for enrichment, on careful choice of the organisms to multiplex, as well as on the wide collection of accumulated mitochondrial reference datasets for post-sequencing sorting and identification instead of individual tagging. Multiple divergent organisms can be sequenced simultaneously, and their complete mitogenome obtained at a very low cost. We provide in silico testing of dataset assembly for a selected set of example datasets.
We describe here the approach, the type of sequence data it generates, the procedure to recover mitochondrial genomes without external tagging, and some potential uses. We perform an in-silico validation test based on the analysis of a simulated dataset with read lengths of two different sizes to represent average read length of three 2nd generation desktop sequencing platforms, Illumina Mi-Seq, 454 GS junior and Ion Torrent PGM. Thus we can contrast their relative efficiencies for the experimental protocol proposed here.
Sounds great. Except I wrote a paper with David Pollock, Norman Doggett, and Michael Cummings published in 2000 proposing the same thing. Our paper:  Pollock DD, Eisen JA, Doggett NA, Cummings MP. Mol Biol Evol. 2000 Dec;17(12):1776-88. A case for evolutionary genomics and the comprehensive examination of sequence biodiversity.

 Our abstract:
Comparative analysis is one of the most powerful methods available for understanding the diverse and complex systems found in biology, but it is often limited by a lack of comprehensive taxonomic sampling. Despite the recent development of powerful genome technologies capable of producing sequence data in large quantities (witness the recently completed first draft of the human genome), there has been relatively little change in how evolutionary studies are conducted. The application of genomic methods to evolutionary biology is a challenge, in part because gene segments from different organisms are manipulated separately, requiring individual purification, cloning, and sequencing. We suggest that a feasible approach to collecting genome-scale data sets for evolutionary biology (i.e., evolutionary genomics) may consist of combination of DNA samples prior to cloning and sequencing, followed by computational reconstruction of the original sequences. This approach will allow the full benefit of automated protocols developed by genome projects to be realized; taxon sampling levels can easily increase to thousands for targeted genomes and genomic regions. Sequence diversity at this level will dramatically improve the quality and accuracy of phylogenetic inference, as well as the accuracy and resolution of comparative evolutionary studies. In particular, it will be possible to make accurate estimates of normal evolution in the context of constant structural and functional constraints (i.e., site-specific substitution probabilities), along with accurate estimates of changes in evolutionary patterns, including pairwise coevolution between sites, adaptive bursts, and changes in selective constraints. These estimates can then be used to understand and predict the effects of protein structure and function on sequence evolution and to predict unknown details of protein structure, function, and functional divergence. In order to demonstrate the practicality of these ideas and the potential benefit for functional genomic analysis, we describe a pilot project we are conducting to simultaneously sequence large numbers of vertebrate mitochondrial genomes.

And not any mention of our paper in this new one.  I could do a detailed side by side comparison but I am too angry right now.  It's either stealing on purpose or just shoddy work.  I think stealing is unlikely so I will conclude just poor work.  Shoddy job by the authors (Dettai A, Gallut C, Brouillet S, Pothier J, Lecointre G et al).  Shoddy job by the editor Dirk Steinke from Guelph.  Annoying as all heck.

UPDATE 11 AM 12/22: I got carried away with anger when I wrote the last few sentences crossed out above.  Upon further, more rational consideration, I do not think the authors or editors did anything really wrong here.  Yes, they missed some prior literature on the topic and our prior paper is indeed quite similar to theirs.  But our prior paper is pretty hard to find by literature searches (see comments/discussion) and they clearly came up with their ideas independently.  I truly regret the aggressive, obnoxious tone of my post and sincerely apologize to the authors of the new paper.

PS.  I wish to thank @DrShmoo on Twitter for knocking some sense into me


  1. Let's not forget: shoddy work by the peer-reviewers.

  2. Dear Prof. Eisen,
    I have read the commentary you have published to our article « Conventiently pre-tagged and pre-packaged », as well as the article you have written on your blog. As main author of the article, I must say I am of course quite affected by what you wrote and I intend to clarify the situation as quickly and as efficiently as possible.
    While performing bibliographic research before publication of our article, neither me nor my co-authors have encountered your article, nor references to the approach in other articles (I have of course read it immediately upon reading your comments). This was clearly an oversight on our part and we are sorry that it resulted in our not referencing your work.
    The best explanation I can offer is that we were approaching the problem through the prism of new sequencing technologies. I have also focused our search more on the last seven years, when they emerged. Similarly, we checked a very large number of descriptions of the complete mitochondrial genomes in GenBank to look for NGS sequencing and did a follow up on the techniques used, but we did not encounter yours, and neither had any of the colleagues we discussed it with.
    We will of course issue a correction in to our article to acknowledge the work you have published twelve years ago. Please be sure that neither our article nor the rest of our work testing our method were in any way copied off yours: we have developed the approach independently.
    Best regards,
    Agnes Dettai

    1. Thanks for the reply Agnes. I truly appreciate it.

  3. Dettai et al. also fail to cite another paper that proposes a similar method of multiplex sequencing of mitochondrial genomes (which itself fails to cite the Pollack et al. paper):
    Why barcode? High-throughput multiplex sequencing of mitochondrial genomes for molecular systematics

    1. Thanks Tony. Just found that last night too. I think, in the end, I am going to have to accept that we are in part to blame for people missing our paper since the abstract just is clearly not reaching people ... I wonder what the policy is for rewriting a paper abstract and publishing an erratum 12 years later

  4. I'm having trouble digesting this. A quick lookup on google scholar says that the Pollock et al paper has been cited 63 times, which is not the worst citation count of my papers. I'm not entirely sure what the response should be. Maybe a quick note describing the lineage, and trying to bring in the appropriate keywords that people seem to want to use today, that I guess we didn't in the original? In this case, the politics and vagaries of science may have a lot to do with why the technique is not used more. In our own work, it seems more interesting to do sample sequencing from genomes, which gets the mitochondria as well as a good proportion of the highly repetitive transposable elements. Also, I thought when we published this 12 years ago that the era of publishing single mitochondrial genomes (one at a time) was dead, but this did not prove to be the case. It seems there is some tendency to want to publish them one at a time (more papers, tenure more likely), and if you sequence a bucketload at once it is harder not to publish them all at once. Nevertheless, if we were simply over a decade ahead of our time, it still seems that we ought to be credited with the concept. The main difference in the strategy today is cost.

    1. We definitely should get credited with the concept. I am willing to grant the authors of these two papers some benefit of the doubt, especially given that our paper does not come up as a top hit in some searches I did.

  5. Dear Prof. Eisen,
    We deeply appreciate the update. We will submit the corrections very soon, incorporating both missed papers at all the relevant places. I hope the approach really launches (we will be working on it anyway), beacause I believe it can be incredibly useful.
    Thank you very much, and merry christmas to all,
    Agnes Dettai

  6. A quick comment on this affair: There are way too many paper published nowadays that it can't be expected that people will not miss the relevant papers when they do their bibliography. I blame "publish or perish" for this.


Storify of Day 1 of "An open digital global south meeting" at #UCDavis

I made a Storify of Tweets and some pictures from the "An open digital Global South" meeting that I am a co-organizer of. This was...