Tuesday, May 10, 2011

Strange things at #PLoS; a public call to get rid of the constraints of describing author contributions

Well, am working with some others to submit a paper from a DARPA project to PLoS Computational Biology. And yet again, we have to fill out this form regarding author contributions. And yet again, I am baffled by this. PLoS can be so wise in some areas of publishing. But yet remarkably non creative in others. They ask for you to say which authors "Conceived and designed the experiments" which "Performed the experiments" which "Contributed reagents/materials/analysis tools" and which "Analyzed the data" which "Wrote the paper." This has always seemed completely inane to me. First of all, this just does not work for some types of scientific research. Plus it seems so forced and arbitrary.

Why not actually let the authors say who did what in their own words? You can, I note, sort of get around this by badgering the copy editors a bit (e..g, see in my PLoS ONE: Stalking the Fourth Domain in Metagenomic Data: Searching for, Discovering, and Interpreting Novel, Deep Branches in Marker Gene Phylogenetic Trees where we added some additional categories of "Ideas and discussion" "Built microbial genome database" "Analyzed sequences linked to RecA and RpoB clusters" and "Analysis of distributions of sequences in GOS data."

Even Nature lets the authors use their own words. For example, in my Genomic Encyclopedia paper published with Nature's Creative Commons license for genome papers we wrote:

"D.W. (rRNA analysis, gene families, actin tree, manuscript preparation), P.H. (selection of strains, analysis, manuscript preparation, project coordination), L.G. and D.B. (project management), R.P., B.J.T., E.L., S.G., S.S. (strain curation and growth), K.M., N.N.I., I.J.A., S.D.H., A.P., A.Ly. (annotation, genome analysis), V.K. (CRISPRs, actin), M.W. (whole genome tree), P.D., C.K., A.Z. and M.S. (actin studies), M.N., S.L., J.-F.C., F.C. and E.D. (sequencing), C.H., A.La., M.N. and A.C. (finishing), P.C. (analysis), E.M.R. (manuscript preparation), N.C.K. (selection of strains, annotation, analysis), H.-P.K. (strain selection and growth, DNA preparation, manuscript preparation), J.A.E. (project lead and coordination, analysis, manuscript preparation)."

Which is more useful? I think without a doubt, the constraints by the PLoS system obfuscate what people did. And it is so unnecessary. Here's a public call for PLoS to get rid of this constraint. (I am sure some at PLoS will give me grief for a public call like this, but hey it is the Public Library of Science right?). It seems completely inconsistent with many other aspects of PLoS publishing. Let the author's describe what they did in their own words.


  1. Well, it may be that the fields PLoS describes are too limited, but there's a problem with making them completely free-form; such comments can't be data-mined in any meaningful way. The future of science will involve computational analysis of papers and standardized fields are a way forward to this.

  2. I bet Google would take that as a challenge. We live in the age of n-grams, after all. :-)

    More seriously, I expect that even with free-form author contribution fields, there would still exist enough commonalities that sifting the corpus would find regularities. Slight variations on "data analysis", "simulation code" and "wrote the paper" will appear in many fields. In fact, free-form author contribution data could be a great boon to the science of doing science: we could tell which types of acknowledged contribution are common to many disciplines, which are concentrated in particular sub-fields, if any occur in sub-fields we didn't expect were related, and so on. PhD theses in library science are waiting to be written, here.

    Free-form author contributions are more informative to human readers; the challenge would be to make them informative to computers, too. The next challenge after that would be to prevent people from coining words like "contributionomics", but I think we just lost that battle.

  3. The academic editor-in-chief of PLoS Biology, blogging about a PLoS policy that ought to be changed in the stroke of the editor's mighty pen? Maybe there's something else amiss at PLoS!

  4. It's according to the ICMJE criteria so it's not arbitrary, although that doesn't necessarily justify structured vs free-form descriptions. If we don't ask these things clearly, authors tend not to tell us clearly who did what.

    Clear attribution of authorship is important in science publishing, e.g. see http://journalology.blogspot.com/2007/07/not-being-clear-about-authorship-is.html

    (p.s. personal comments, not intended to be a PLoS staff response)

  5. @jon B: That might be true at the moment, but aren't we all hoping that future data-mining will be sufficiently sophisticated to retrieve directly from free-text?

  6. I am not buying the need for formal descriptors here. Yes, Matt, I realize some people out there have pushed for more formalism as a way of getting authors to say who did what and to try to limit ghost authorship. But I fundamentally disagree that this is wise. We are evolving towards all sorts of different forms of publication, from publishing data to publishing videos to nanopublications. I get that metadata is useful and have ben advocating it for a long time in the context of my work. But I believe that perhaps one of the most important part of a paper where we need flexibility is in the description of who did what. It is from that that we can discover how papers really work. Filling out this form PLoS uses is, I would argue, of almost no value.

  7. Jonathan – I agree that the categories we use for indicating the role that authors have played in a particular article might not be perfect (far from it, I hear you say). But I also agree with the earlier comment that making these fields ‘minable’ could be very useful, especially if we could arrive at more standardized vocabularies which are used by many journals. So, I’d say we should try and balance standardization with flexibility (that’s why we do have a free text box for ‘other’ contributions), and move towards an approach that is generally more useful for research assessment.

  8. Mark - mining genome data can be useful. Mining information in figures and text of a paper can be useful. Mining information on anything can be useful. However, any constraint placed on how you describe things is a cost. The cost needs to outweigh the benefits (real or potential). I see almost no potential benefit here to standardizing the description of contributions of authors in papers in the current way it is done in PLoS papers. The free text box is nice but most people don't bother using it. By putting those categories in place, PLoS almost certainly generates enormous amounts of misinformation. Until someone shows that there is a benefit to this - I would recommend scrapping it or maybe adding it as an "experimental" form that people can fill out in addition to actually describing who did what.

  9. I agree with Jonathan. There may be marginal value in structuring the author contribution sections (though I sure don't know what) but surely it's utterly dwarfed by other pressing problems that need to move forward. For example, if we're concerned about being able to mine the literature, one priority is how publishers will archive and serve up large data sets. PLoS should be a leader in this -- but instead, its submission system expects to convert large datasets to PDF.

    Structuring the human-readable text of one of the least scientific parts of the paper would be far down my to-do list. Who cares if the author contributions are data-minable, if the science isn't?

  10. I agree with Jonathan as well. When we submitted a paper to PLoS Neglected Tropical Diseases last year (PubMed ID 20808766), we had to use the same categories to denote author contributions even though the paper was essentially a bioinformatics paper and the biggest single task was creating and updating a database. Is a database a "reagent"? It was tempting to leave "Conceived and designed the experiments" and "Performed the experiments" unattributed, since there were no experiments in the usual sense of the word....


Storify of Day 1 of "An open digital global south meeting" at #UCDavis

I made a Storify of Tweets and some pictures from the "An open digital Global South" meeting that I am a co-organizer of. This was...