The Tree of Life: Oh the irony - new #OpenAccess #PLoSOne paper on Research Blogs doesn't share data behind analyses.

Saturday, May 12, 2012

Oh the irony - new #OpenAccess #PLoSOne paper on Research Blogs doesn't share data behind analyses.

Interesting new paper: PLoS ONE: Research Blogs and the Discussion of Scholarly Information. All about the new world of science blogging. Much of the context here relates to openness. Yet as far as I can tell, the data collected that make up the meat of the analyses in the paper, are not shared. Uggh.

Is there something I am missing here? Shouldn't a prerequisite of publishing this kind of paper be sharing the information / data used in the analyses? Shouldn't that be released with the paper?

Definitely time to start "Open Data Watch" where people have a place to complain about lack of open availability of data behind papers (I came up with the name as a mimic of Ivan Oransky's diverse watch sites like Retraction Watch). Originally in thinking about doing this I had been thinking about genomic data. But I am sure this is a problem in other areas. Consider paleontology, where openness to fossils and other samples is, well, not as common as it should be. It is not that hard anymore to find a place to share one's data. With places like Data Dryad and Biotorrents and FigShare and Merritt and 100s of others it is really inexcusable not to share the data behind a paper in most cases. Certainly, in some cases there maybe privacy issues but that is not the case here (I think) and not an issue in most cases.

Come on people. If scientific papers are to be reproducible and testable, you need to give people access to the data you used.

Shema, H., Bar-Ilan, J., & Thelwall, M. (2012). Research Blogs and the Discussion of Scholarly Information PLoS ONE, 7 (5) DOI: 10.1371/journal.pone.0035869

13 comments:

Bob O'H5/12/2012 1:29 AM
Shouldn't a prerequisite of publishing this kind of paper be sharing the information / data used in the analyses?
From the editing & publishing policies of PLoS One:
Publication is conditional upon the agreement of the authors to make freely available any materials and information described in their publication that may be reasonably requested by others for the purpose of academic, non-commercial research.
ReplyDelete
Replies
Anonymous5/12/2012 5:58 AM
Build it an they will come...I could at least a dozen examples like this off the top of my head, and I for one would certainly contribute to a system that could log failures to share data.

In this case, since it is a PLoS ONE paper, I would leave a comment on the paper highlighting that the data are not available and suggesting that the authors submit, e.g. to Dryad.
ReplyDelete
Replies
Titus Brown5/12/2012 7:24 AM
I do not understand why journals let people get away with this.
ReplyDelete
Replies
Hadas Shema5/12/2012 1:53 PM
You're right. It has to do more with me switching computers, etc. than anything else. I'll try to upload them when things will calm down a bit.
ReplyDelete
Replies
Anonymous5/12/2012 5:47 PM
I am all for requiring people to provide raw data. But I would not be 100% about it and would provide room for exceptions, approved by editors, etc.

How about:

- data that are essentially written in pencil on reams and reams of paper
- data that make no sense until they are visualized (and where analysis can begin only once the numbers are turned into images, not before, and images are provided in the main body of the paper)
- qualitative or subjective data, all or representative sample of which are provided in the main body of the paper
- data that can be processes in only one way, so everyone would always get the same resulting numbers
- data formatted by software that cannot be read by any computer younger than 20 years?

And assume that the lab is gone, people are gone, equipment is gone, money is gone and there are no resources to type in tens of thousands of handwritten numbers into an Excel sheet, or to translate data from old software to new (probably expensive) software.

Should such papers be prevented from being published?
ReplyDelete
Replies
Aaron Darling5/14/2012 1:10 PM
- data that can be processes in only one way, so everyone would always get the same resulting numbers

This is an important consideration for computationally-oriented papers with simulated datasets. The simulated datasets can often be generated quickly and repeatably using the same pseudo random number generator and seed, but storing these datasets might a million or more times the space needed to store the program that generates them.

This might also be true for MCMC type analyses wherein the same chain can be regenerated from only the random number seed, but that can be more compute intensive than simple simulations.
ReplyDelete
Replies
Unknown1/14/2013 4:09 AM
it is really inexcusable not to share the data behind a paper in most cases. Certainly, in some cases there maybe privacy issues but that is not the case here (I think) and not an issue in most cases. njtaxpreparation.net
ReplyDelete
Replies