Sunday, June 15, 2008

A million minds getting together can be confusing but might end up being really cool

Their is a possibly interesting paper in Genome Biology by Barend Mons et al: Calling on a million minds for community annotation in WikiProteins. I say possibly because the paper itself is quite confusing to me but the overall goal seems to be a cool concept. This group has created and is encouraging the use of "WikiProteins" a community annotation system for "community knowledge." Sounds a bit fuzzy? Well, reading the paper does not completely help. For example here is the abstract
WikiProteins enables community annotation in a Wiki-based system. Extracts of major data sources have been fused into an editable environment that links out to the original sources. Data from community edits create automatic copies of the original data. Semantic technology captures concepts co-occurring in one sentence and thus potential factual statements. In addition, indirect associations between concepts have been calculated. We call on a 'million minds' to annotate a 'million concepts' and to collect facts from the literature with the reward of collaborative knowledge discovery. The system is available for beta testing at webcite.
I got really lost reading this I confess. But I moved on since the overall concept seemed quite intriguing, even if I did not get it completely. But it did not get much clearer further on. For example consider their description of a "knowlet"

The future outlook to integrate data mining (for instance gene co-expression data) with literature mining, as formulated in the review by Jensen et al. [2], is at the core of what we aim for at the text mining/data mining interface. To support the capturing of qualitative as well as quantitative data of different natures into a light, flexible, and dynamic ontology format, we have developed a software component called Knowlets™. The Knowlets combine multiple attributes and values for relationships between concepts.

Scientific publications contain many re-iterations of factual statements. The Knowlet records relationships between two concepts only once. The attributes and values of the relationships change based on multiple instances of factual statements (the F parameter), increasing co-occurrence (the C parameter) or associations (The A parameter). This approach results in a minimal growth of the 'concept space' as compared to the text space (Figure 1).

OK ... I got lost every time I tried to read this in detail. I do think they could benefit greatly by translating their paper from the language used by people who work on text mining to a broader presentation.

But reading between the lines here, this is a new, apparently open access system to try and get community annotation for "Concepts" and for relationships among concepts in biological sciences. Those concepts could include a wide range of things, including genes, genomes, proteins, as well as more standard concepts like functions. Whatever this system is, it seems worth checking out.

I leave you with their ending:

Once widely used and augmented, this resource could become an open, yet quality assured and comprehensive, environment for collaborative reference and knowledge discovery.

Now that I can say I understand and it sounds good to me. If anyone out there has any more insight into this, please give your input.


  1. Thanks. I missed Euan's post. Looks like they had some disagreements though it is not clear what the resolution is. My problem was more with the writing itself ... it seemed to be geared too much to the insiders and not to the people they should be recruiting to contribute.


Most recent post

My Ode to Yolo Bypass

Gave my 1st ever talk about Yolo Bypass and my 1st ever talk about Nature Photography. Here it is ...