Saturday, September 09, 2006

The hypocrisy of most projects with "Open" data release

There has been a growing trend in biological research, for scientists to release their data in some way or another prior to publication. This data release is meant to promote the advancement of science, and it frequently does. This is perhaps best seen with genome sequencing projects, such as the public version of the "Human Genome Project." In many if not most cases, centers that do the bulk of the sequencing work release the sequence data for searching by others, even before publishing papers on their own data. In most cases, restrictions are placed on how the data can be used, but the data is still released for others to look at.

This is of course in contrast to how much of science works, with researchers keeping their data to themselves until they are ready to publish something. The genome centers who have made their data available prior to publication deserve some credit for this openness. Especially since the data release in general by genome centers has been so far and beyond what biology researchers do. In fact, many of these centers go out of their way to promote getting such credit (they even got Clinton and Blair to play along) The best example of this was the public human genome project, which made multiple claims about how great they were for humanity for releasing the data "within 24 hours of gathering it." This data release policy was captured in something that became known as the Bermuda Principles, due to a meeting that took place in Bermuda (see a nice summary of this by John Sulston here).

What is appalling to me, however, is that these same centers that try to take credit for their openness, then turn around and usually publish their papers in non Open Access journals (for those who do not know, this means that then one has to pay money, frequently enormous sums of money, just to read the paper). I do not understand this. A paper about an analysis someone did on a data set may in fact be more valuable to the community than the data itself. If the genome centers like TIGR, JGI, Sanger, Whitehead, etc. really wanted to be on the side of openness, they should stop publishing their papers in non Open Access journals. Unfortunately these places publish very few of their papers in such journals.

For example, the Joint Genome Institute (JGI) which I am now affiliated with, is continually showing two faces on this issue. On the one hand, the issue press release after press release regarding their release of data on various genome projects (e.g., here). That is fine, although a little over the top sometimes. But then they almost never publish any of their work in Open Access journals (e.g., see their latest press release on a paper published about a genome in Science, a non Open Access journal). Any taxpayers out there should be disappointed with this as the genome centers get TONS of money to carry out this work for the public benefit. And then for the papers on the work to be hidden behind huge subscription fees is a waste of your money.

This is particuarly surprising coming from JGI since JGI is run directly by the Department of Energy (unlike most other centers which are either private or part of a university). Thus apparently DOE does not want to follow even the recommendations of congress and the senate regarding Open Access to publications. Nor does DOE apparently want to do the right thing by requiring their institutes for publish in Open Access journals. Too bad. Taxpayers hopefully will begin to get more and more upset about the waste of their money as these centers take enormous amounts of the federal science budget and convert it into documents that only a few can read.

1 comment:

  1. But the data that the genome centers make available immediate are trace files, right? Those aren't of use except to a few people with the know how and computing power to assemble them. The more useful data are assemblies and annotations, which don't usually get deposited into databases until later.

    I'd argue that the assemblies and annotations are worth more than the published manuscript. These must be open access, and I believe they all are.

    ReplyDelete

Most recent post

Talk on Sequencing and Microbes ...

I recently gave a talk where I combined what are normally two distinct topics - the Evolution of DNA Sequencing, and the use of Sequencing t...