The Tree of Life: Interesting new metagenomics paper w/ one big big big caveat

Friday, February 03, 2012

Interesting new metagenomics paper w/ one big big big caveat - critical software not available "

Very very strange. There is an interesting new metagenomics paper that has come out in Science this week. It is titled "Untangling Genomes from Metagenomes: Revealing an Uncultured Class of Marine Euryarchaeota" and it is from the Armbrust lab at U. Washington.

One of the main points of this paper is that the lab has developed software that apparently can help assemble the complete genomes of organisms that are present in low abundance in a metagenomic sample. At some point I will comment on the science in the paper, (which seems very interesting) though as the paper in non Open Access I feel uncomfortable doing so since many of the readers of this blog will not be able to read it.

But something else relating to this paper is worth noting and it is disturbing to me. In a Nature News story on the paper by Virginia Gewin there is some detail about the computational method used in the paper:

"He developed a computational method to break the stitched metagenome into chunks that could be separated into different types of organisms. He was then able to assemble the complete genome of Euryarchaeota, even though it was rare within the sample. He plans to release the software over the next six months."

What? It is imperative that software that is so critical to a publication be released in association with the paper. It is really unacceptable for the authors to say "we developed a novel computational method" and then to say "we will make it available in six months". I am hoping the authors change their mind on this but I find it disturbing that Science would allow publication of a paper highlighting a new method and then not have the method be available. If the methods and results in a paper are not usable how can one test/reproduce the work?

23 comments:

Nick Loman2/03/2012 7:29 AM
I already posted this on Twitter but I really think that peer reviews have a responsibility to insist that software developed for a manuscript is both available and open-source before publication. Ideally this would be in some trusted location like Github, Sourceforge or Google Code. This also means reviewers can access it without giving away their identity (if this is an issue for them, I don't usually care and have taken to signing my reviews).
ReplyDelete
Replies
Titus Brown2/03/2012 7:49 AM
Our software implementation to do a similar thing (we don't split the graph heuristically) is, in fact, on github. And hey, look, the submitted paper is available, too! http://arxiv.org/abs/1112.4193. It's still in review, though.
ReplyDelete
Replies
Titus Brown2/03/2012 7:56 AM
And Jonathan, I'm happy to comment on the science for you, since we've been pursuing this approach for about 2 years, although I would need to run some tests on their data set first. From skimming, the only real weakness is that they run an assembly first, and then partition the assembled data. Since many assemblers perform poorly on raw metagenomic data, this is unlikely to be as comprehensive as it could be. Also note that similar-in-style (although more heuristic) approaches were used in the rumen paper (Hess et al.) and the Arctic permafrost paper (Mackelprang, 2011). Good stuff, all in all.
ReplyDelete
Replies
Titus Brown2/03/2012 7:58 AM
Last comment: sea http://armbrustlab.ocean.washington.edu/seastar. Right now it says "will be updated week of Feb 6th."
ReplyDelete
Replies
Julie2/03/2012 8:00 AM
Is this really a "rare" bug given it made up 7.5% of the sample? I would also note, from that news story, that many Euryarchaeota have been cultured and sequenced, just not this one! "One of those genomes came from the Euryarchaeota, a widespread group of marine microorganisms, none of which have been grown in culture or sequenced."
ReplyDelete
Replies
Jonathan Eisen2/03/2012 8:00 AM
Yes - it says "information will be updated" - it does not say software will be made available
ReplyDelete
Replies
Jonathan Eisen2/03/2012 8:07 AM
I note - I have written to the software developer to encourage him to make it available ASAP ...
ReplyDelete
Replies
Jonathan Eisen2/03/2012 9:30 AM
Note - Virginia Gewin did contact me about commenting on the paper but we did not connect so I did not talk to her about her story. I was contacted by Biotechniques and they wrote an article.
ReplyDelete
Replies
-DG2/03/2012 1:16 PM
There is always the possibility that the authors will make the "as is, used in the publication" code available to anyone on request and what is being released within the next 6 months is the user friendly full on useful program. Fairly normal in my experience to be ready to publish before you really have a nice user-friendly implementation of your software ready for release.

But of course the code/scripts you used in the publication need to be available right at the time of publication, even if they would, at that point, be less than useful for most researchers. It at least allows inspection for bugs and verification of results.
ReplyDelete
Replies
Shaun2/03/2012 5:00 PM
This is truly annoying. In my opinion, the reviewers aren't doing their jobs if they haven't run the software in a paper like this (even if just on demo data, and yes, all computational biologists should provide an executable demo with their software).

I can point you to a Nature paper from a few years ago where software that was crucial to the findings was just described as "manuscript in preparation". Guess what -- the manuscript never appeared!
ReplyDelete
Replies
caseybergman2/04/2012 12:13 AM
While I agree with Nick & Shaun that reviewers should help in the policing especially when journal guidelines are lax/ambiguous, in this case the authors (and editorial staff) are not even abiding by Science's own guidelines set out by Hanson, Sugden, Alberts in their editorial "Making Data Maximally Available": http://www.sciencemag.org/content/331/6018/649.full

"To address the growing complexity of data and analyses, Science is extending our data access requirement listed above to include computer codes involved in the creation or analysis of data. "
ReplyDelete
Replies
Neil2/04/2012 4:13 AM
I blogged about "missing software" in papers recently. It drives me nuts. I agree that this constitutes improper reviewing and editorial practice. It's bad for science.
ReplyDelete
Replies
Ross Mounce2/04/2012 4:40 AM
This paper clearly doesn't abide by the Science Code Manifesto: http://sciencecodemanifesto.org/

I suggest everyone reads and endorses those sound principles (if they haven't already!).

Much like the Panton Principles (http://pantonprinciples.org/) they're a simple, clear set of guidelines on the use of software in academic publications.
ReplyDelete
Replies
Mike Taylor2/04/2012 7:14 AM
You think that's bad? What about Stevens, Kent A., and J. Michael Parrish. 1999. Neck Posture and Feeding Habits of Two Jurassic Sauropod Dinosaurs. Science 284:798-800 -- http://www.sciencemag.org/cgi/reprint/284/5415/798.pdf

That came out thirteen years ago, and describes a then-new program for manipulating in 3d virtual models of bones -- in particular, dinosaur neck bones. The software's never been released, so no-one's ever been able to even attempt replicating their results. (Disclosure: I think their results are flawed, and have published on the subject.)
ReplyDelete
Replies
Paul2/06/2012 7:23 PM
This paper is mentioned in the NY Times today....alas, no mention of the not making the software available.

http://www.nytimes.com/2012/02/07/science/euryarchaeota-has-never-been-seen-but-now-its-genome-has.html?ref=science

Paul
http://www.ipscell.com
ReplyDelete
Replies
pyrimidine9/20/2012 6:43 AM
This article is worth a follow-up. We're now edging into fall and the website was supposed to have released all code by now. They have released only the first part/phase of three (and the initial code release on github hasn't seen activity since the creation of the repo), which is yet another example why we need more adherence to standards like the Science Code Manifesto and the practices outlined for the Bioinformatics Testing Consortium.
ReplyDelete
Replies
Anonymous9/20/2012 6:36 PM
Some times it seems that people from biological areas still don't see software and programming codes as true scientific results or as part of the scientific method. Wetlab results are mandatory and with some level of quality, those based on computational methods can be messy, poorly described and with no control. A paper describing the results based on a new software without the software dont make sense, we can only believe in what is said.
ReplyDelete
Replies