Monday, December 12, 2011

Pluses and minuses if Wikipediafying your database

There is a really interesting article that just came out from Robert Finn, Paul Gardner and Alex Bateman: Making your database available through Wikipedia: the pros and cons.  The article is part of the Nucleic Acids Research Database Issue.

The abstract does a good job of summing up the article
Wikipedia, the online encyclopedia, is the most famous wiki in use today. It contains over 3.7 million pages of content; with many pages written on scientific subject matters that include peer-reviewed citations, yet are written in an accessible manner and generally reflect the consensus opinion of the community. In this, the 19th Annual Database Issue of Nucleic Acids Research, there are 11 articles that describe the use of a wiki in relation to a biological database. In this commentary, we discuss how biological databases can be integrated with Wikipedia, thereby utilising the pre-existing infrastructure, tools and above all, large community of authors (or Wikipedians). The limitations to the content that can be included in Wikipedia are highlighted, with examples drawn from articles found in this issue and other wiki-based resources, indicating why other wiki solutions are necessary. We discuss the merits of using open wikis, like Wikipedia, versus other models, with particular reference to potential vandalism. Finally, we raise the question about the future role of dedicated database biocurators in context of the thousands of crowdsourced, community annotations that are now being stored in wikis.

They discuss many of the worries people have with integrating their database with Wikipedia and go through a few examples including the 11 databases that are part of the NAR Database Issue that are Wikipediafied.

Regarding "vandalism" where inappropriate edits are made by someone the authors here report
" Overall, inappropriate editing is rare for pages related to biology and occurs at very low rates. Since Rfam started to use Wikipedia to manage the textual annotation of RNA families in 2007, ∼1% of all edits have been reverted, by the Rfam curators or the greater Wikipedia community, suggesting that they may be vandalism (2). Similar numbers are presented for the GeneWiki project (3), with these authors estimating that vandalism is observed less than one page view in every 3000, a rate much lower than other articles in Wikipedia."
They go on to discuss other issues associated with Wikipediafication of a Database including how to best do the integration, how to track changes, and how to leverage the Wikipedia community.  They also discuss how to deal with the fact that Wikipedia does not allow "Original Research" and thus for example a conjecture by a user about some protein function.  They also discuss issues like whether the use of Wikipedia will lead to decreased funding for full time curators (they do not think so).

They conclude
Wikis are undoubtedly changing the way some biological databases operate, providing an established solution to community annotation. Adopting a wiki means that a particular resource is no longer a closed, static resource (between public data releases) that cannot be improved by its users. The challenge is now to get scientists en masse to generate and edit articles. How editors receive credit for their work on the article is unclear. Assuming they work on the subject area, wiki articles provide them an opportunity to showcase their work in context of the field at the very least. Curation of biological data must be crowdsourced if there is any chance to comprehensively annotate the vast datasets that are being generated and the scientific community should feel responsible. Hopefully this article has provoked thoughts as to how a wiki, especially Wikipedia, may work for a resource that you are responsible for or use. The growing number of databases using wikis suggests that they are here to stay, we now face the issue of how to overcome the social engineering required to get everyone involved.
I personally agree with many of their conclusions (e..g,  the issue of credit does not to be worked out better).  There is no doubt that for me this article had the desired effect of provoking me to think a bit more about how Wikis could be used for various projects in which I am involved.

This paper is definitely worth a read.

Other related things worth reading on this topic:

No comments:

Post a Comment

Most recent post

Talk on Sequencing and Microbes ...

I recently gave a talk where I combined what are normally two distinct topics - the Evolution of DNA Sequencing, and the use of Sequencing t...