Tuesday, October 12, 2010

Figuring out figures in scientific papers: new search / ranking method outline in PLoS One paper

Just a quick post here.  A colleague just sent me a link to her fascinating new paper in PLoS One: PLoS ONE: Automatic Figure Ranking and User Interfacing for Intelligent Figure Search

In this paper Hong Yu from the University of Wisconsin in Milwaukee describes a system for better automated characterization of figures from scientific papers.  The system is available through their webserver "Ask Hermes".

If you want to learn more about the system I suggest you read the paper.  Or watch their video.

Basically the general idea is summarized in their background section of the abstract:
Figures are important experimental results that are typically reported in full-text bioscience articles. Bioscience researchers need to access figures to validate research facts and to formulate or to test novel research hypotheses. On the other hand, the sheer volume of bioscience literature has made it difficult to access figures. Therefore, we are developing an intelligent figure search engine (http://figuresearch.askhermes.org). Existing research in figure search treats each figure equally, but we introduce a novel concept of “figure ranking”: figures appearing in a full-text biomedical article can be ranked by their contribution to the knowledge discovery.
I particularly like that they also allow searching just for open access figures, which may be of significant value to people who want to do things like make a slide presentation with no copyrighted/protected material in it.  For example see the results of a search for open access figures using the keyword phylogenomics.

Anyway - definitely worth checking this out.

Yu, H., Liu, F., & Ramesh, B. (2010). Automatic Figure Ranking and User Interfacing for Intelligent Figure Search PLoS ONE, 5 (10) DOI: 10.1371/journal.pone.0012983


  1. You know a lot more about it than I do, but just because the articles are open-access doesn't mean that the figures are free from copyright. Both PLoS and BMC use the CCAL, so anyone can reuse figures from those journals in e.g. a presentation. However, such use is not free of conditions, and is not free of copyright restrictions. Don't the original authors retain copyright for works published in PLoS and BMC?

  2. The CC licenses used by PLoS are VERY broad and allow unlimited, unrestricted reuse of the material as long as the original source is cited. The authors do retain copyright and I guess that could come with some rights, but by also agreeing to the CC license they set the material "free" - as in freedom. This means nobody needs to get permission to redistribute/use the material

  3. I am curious if a search by keyword (e.g. one so general as "interferon") is really useful enough. Are there plans for more faceted or targeted searches?

  4. I dont know Alyssa - not my system ---

  5. Hi, I am the developer of the system and would like to thank Jonathan for bringing attention to this system.

    Shaun, copyright is definitely a contentious issue and as you pointed out, open-access articles are indeed protected by copyright, but their redistribution is more liberal. As far as CCAL is concerned, we did not find anything that prohibits reuse of figures or other material from such articles.

    Alyssa, we built FigureSearch with the idea of simplicity, inspired by search engines such as Google, where one search box provides all solutions. At the same time, we do provide search functionality over some facets from the Advanced Search option. Nonetheless, FigureSearch is a work in progress, and I'd be thrilled to get your and others' feedback on how we could improve it. So please feel free to send me your thoughts and 'wishlist'.

  6. Alyssa and Jonathan,

    Yes, we plan to continue to develop NLP for focused search. It is in our pipeline!


Storify of Day 1 of "An open digital global south meeting" at #UCDavis

I made a Storify of Tweets and some pictures from the "An open digital Global South" meeting that I am a co-organizer of. This was...