The Tree of Life: How Open Are You? Part 1: Metrics to Measure Openness and Free Availability of Publications

Sunday, June 16, 2013

How Open Are You? Part 1: Metrics to Measure Openness and Free Availability of Publications

For many many years I have been raising a key questions in relation to open access publishing - how can we measure how open someone's publications are. Ideally we would have a way of measuring this in some sort of index. A few years ago I looked around and asked around and did not find anything out there of obvious direct relevance to what I wanted so I started mapping out ways to do this.

When Aaron Swartz died I started drafting some ideas on this topic. Here is what I wrote (in January 2013) but never posted:

With the death of Aaron Swartz on Friday there has been much talk of people posting their articles online (a short term solution) and moving more towards openaccess publishing (a long term solution). One key component of the move to more openaccess publishing will be assessing people on just how good a job they are doing of sharing their academic work.

I have looked around the interwebs to see if there is some existing metric for this and I could not find one. So I have decided to develop one - which I call the Swartz Openness Index (SOI).

Let A = # of objects being assessed (could be publications, data sets, software, or all of these together).

Let B = # of objects that are released to the commons with a broad, open license.

A simple (and simplistic) metric could be simply

OI = B / A

This is a decent start but misses out on the degree of openness of different objects. So a more useful metric might be the one below.

A and B as above.

Let C = # of objects available free of charge but not openly

OI = ( B + (C/D) ) / A

where D is the "penalty" for making material in C not openly available

This still seems not detailed enough. A more detailed approach might be to weight diverse aspects of the openness of the objects. Consider for example the "Open Access Spectrum." This has divided objects (publications in this case) into six categories in terms of potential openness: reader rights, reuse rights, copyrights, author posting rights, automatic posting, and machine readability. And each of these is given different categories that assess the level of openness. Seems like a useful parsing in ways. Alas, since bizarrely the OAS is released under a somewhat restrictive CC BY-NC-ND license I cannot technically make derivatives of it. So I will not. Mostly because I am pissed at PLoS and SPARC for releasing something in this way. Inane.

But I can make my own openness spectrum.

And then I stopped writing because I was so pissed off at PLOS and SPARC for making something like this and then restricting it's use. I had a heated discussion with people from PLOS and SPARC about this but not sure if they updated their policy. Regardless, the concept of an Openness Index of some kind fell out of my head after this buzzkill. And it only just now came back to me. (Though I note - I did not find the Draft post I made until AFTER I wrote the rest of this post below ... ).

To get some measure of openness in publications maybe a simple metric would be useful. Something like the following

P = # of publications
A = # of fully open access papers
OI = Openness index

A simple OI would be

OI = 100 * A/P

However, one might want to account for relative levels of openness in this metric. For example

AR = # of papers with a open but somewhat restricted license
F = # of papers that are freely available but not with an open license
C = some measure of how cheap the non freely available papers are

And so on.

Given that I am not into library science myself and not really familiar with playing around with this type of data I thought a much simpler metric would be to just go to Pubmed (which of course works only for publications in the arenas covered by Pubmed).

From Pubmed one can pull out some simple data.

# of publications (for a person or Institution)
# of those publications in PubMed Central (a measure of free availability)

Thus one could easily measure the "Pubmed Central" index as

PMCI = 100 * (# publications in PMC / # of publications in Pubmed)

Some examples of the PMCI for various authors including some bigger names in my field, and some people I have worked with.

Name	#s	PMCI
Eisen JA	224/269	83.2
Eisen MB	76/104	73.1
Collins FS	192/521	36.8
Lander ES	160/377	42.4
Lipman DJ	58/73	79.4
Nussinov R	170/462	36.7
Mardis E	127/187	67.9
Colwell RR	237/435	54.5
Varmus H	165/408	40.4
Brown PO	164/234	70.1
Darling AE	20/27	74.0
Coop G	23/39	59.0
Salzberg SL	107/162	61.7
Venter JC	53/237	22.4
Ward NL	24/58	41.4
Fraser CM	78/262	29.8
Quackenbush J	95/225	42.2
Ghedin E	47/82	57.3
Langille MG	10/14	71.4

And so on. Obviously this is of limited value / accuracy in many ways. Many papers are freely available but not in Pubmed Central. Many papers are not covered by Pubmed or Pubmed Central. Times change, so some measure of recent publications might be better than measuring all publications. Author identification is challenging (until systems like ORCID get more use). And so on.

Another thing one can do with Pubmed is to identify papers with free full text available somewhere (not just in PMC). This can be useful for cases where material is not put into PMC for some reason. And then with a similar search one can narrow this to just the last five years. As openaccess has become more common maybe some people have shifted to it more and more over time (I have -- so this search should give me a better index).

Lets call the % of publications with free full text somewhere the "Free Index" or FI. Here are the values for the same authors.

Name	PMC % Pudmed	PMCI	Free % Pubmed 5 years	FI - 5	Free % Pubmed All	FI-ALL
Eisen JA	224/269	83.2	178/180	98.9	237	88.1
Eisen MB	76/104	73.1	32/34	94.1	83	79.8
Collins FS	192/521	36.8	104/128	81.3	263	50.5
Lander ES	160/377	42.4	78/104	75.0	200	53.1
Lipman DJ	58/73	79.4	20/22	90.9	59	80.8
Mardis E	127/187	67.9	90/115	78.3	135	72.2
Colwell RR	237/435	54.5	31/63	49.2	258	59.3
Varmus H	165/408	40.4	21/28	75.0	206	50.5
Brown PO	164/234	70.1	20/21	95.2	185	79.0
Darling AE	20/27	74.0	18/21	85.7	21	77.8
Coop G	23/39	59.0	16/20	80.0	28	71.8
Salzberg SL	107/162	61.7	54/58	93.1	128	79.0
Venter JC	53/237	22.4	20/33	60.6	85	35.9
Ward NL	24/58	41.4	18/27	66.6	30	51.7
Fraser CM	78/262	29.8	9/13	69.2	109	41.6
Quackenbush J	95/225	42.2	54/75	72.0	131	58.2
Ghedin E	47/82	57.3	30/36	83.3	56	68.3
Langille MG	10/14	71.4	11/13	84.6	11	78.6

Very happy to see that I score very well for the last five years. 180 papers in Pubmed. 178 of them with free full text somewhere that Pubmed recognizes. The large number of publications comes mostly from genome reports in the open access journals Standards in Genomic Sciences and Genome Announcements. But most of my non genome report papers are also freely available.

I think in general it would be very useful to have measures of the degree of openness. And such metrics should take into account sharing of other material like data, methods, etc. In a way this could be a form of the altmetric calculations going on.

But before going any further I decided to look again into what has been done in this area. When I first thought of doing this a few years ago I searched and asked around and did not see much of anything. (Although I do remember someone out there - maybe Carl Bergstrom - saying there were some metrics that might be relevant - but can't figure out who / what this information in the back of my head is).

So I decided to do some searching anew. And lo and behold there was something directly relevant. There is a paper in the Journal of Librarianship and Scholarly Communication called: The Accessibility Quotient: A New Measure of Open Access. By Mathew A. Willmott, Katharine H. Dunn, and Ellen Finnie Duranceau from MIT.

Full Citation: Willmott, MA, Dunn, KH, Duranceau, EF. (2012). The Accessibility Quotient: A New Measure of Open Access. Journal of Librarianship and Scholarly Communication 1(1):eP1025. http://dx.doi.org/10.7710/2162-3309.1025

Here is the abstract:

Abstract
INTRODUCTION The Accessibility Quotient (AQ), a new measure for assisting authors and librarians in assessing and characterizing the degree of accessibility for a group of papers, is proposed and described. The AQ offers a concise measure that assesses the accessibility of peer-reviewed research produced by an individual or group, by incorporating data on open availability to readers worldwide, the degree of financial barrier to access, and journal quality. The paper reports on the context for developing this measure, how the AQ is calculated, how it can be used in faculty outreach, and why it is a useful lens to use in assessing progress towards more open access to research.
METHODS Journal articles published in 2009 and 2010 by faculty members from one department in each of MIT’s five schools were examined. The AQ was calculated using economist Ted Bergstrom’s Relative Price Index to assess affordability and quality, and data from SHERPA/RoMEO to assess the right to share the peer-reviewed version of an article.
RESULTS The results show that 2009 and 2010 publications by the Media Lab and Physics have the potential to be more open than those of Sloan (Management), Mechanical Engineering, and Linguistics & Philosophy.
DISCUSSION Appropriate interpretation and applications of the AQ are discussed and some limitations of the measure are examined, with suggestions for future studies which may improve the accuracy and relevance of the AQ.
CONCLUSION The AQ offers a concise assessment of accessibility for authors, departments, disciplines, or universities who wish to characterize or understand the degree of access to their research output, capturing additional dimensions of accessibility that matter to faculty.

The full PDF is available here.

I completely love it. After all. it is directly related to what I have been thinking about and, well, they actually did some systematic analysis of their metrics. I hope more things like this come out and are readily available for anyone to calculate. Just how open someone is could be yet another metric used to evaluate them ...

And then I did a little more searching and found the following which also seem directly relevant

So - it is good to see various people working on such metrics. And I hope there are more and more.

Anyway - I know this is a bit incomplete but I simply do not have time right now to turn this into a full study or paper and I wanted to get these ideas out there. I hope someone finds them useful ...

13 comments:

Michael Eisen6/16/2013 12:15 PM
You should count separately papers with the person is first, senior, or middle authors, which, as you know, entail varying degrees of control over where papers are published.
ReplyDelete
Replies
David Basanta6/16/2013 12:19 PM
I agree with Michael, this approach is more suited to measure the openness of senior authors but that might be enough and if not, it's a great start.
ReplyDelete
Replies
BenK6/16/2013 12:44 PM
For a long time (8 years?) I've been on-and-off wrestling with an OA metric question; To what extent has data been exploited (the background question for me was economic exploitation) prior to sharing? How much opportunity cost is being given up by the act of sharing this specific extent of data at the present time?

Now, this is a very different question than the one Jon is asking. What I'm asking is about data sets and less about publications; but it can be connected 'around the back' through the idea of merit and opportunity cost.

How much does OA cost different authors? Do they have to pay out of grants - if so, at what opportunity cost to new projects, students, equipment? Do they pay with lower article metrics and thus via their CV? What is the real cost to the author to make these publications more broadly available? How much did they give up to do this good deed?

Indirectly, this addresses the question: How much honor should they be accorded as a result? Was it somehow self-serving (and thus a fine judgement call but not independently meritorious)? Or was it self-sacrificial and consciously for the good of the scientific community or general public?
ReplyDelete
Replies
Heather Morrison6/16/2013 2:34 PM
Did you notice that Swartz' Guerilla Open Access Manifesto is not CC licensed at all? http://cryptome.org/2013/01/swartz-open-access.htm

I don't think Aaron Swartz was obsessed with metrics, this doesn't seem to be his style at all.

My own perspective is that we should move away from our societal obsession with metrics - we need less of this, not openness metrics.

ReplyDelete
Replies
Heather Morrison6/16/2013 2:51 PM
Demand Progress, the site Aaron Swartz started, is licensed CC-BY-NC-SA. http://www.demandprogress.org/

Reddit is All Rights Reserved.

My suggestion for honouring Swartz today is to stand up for others who are courageous enough to take risks to make things open that should be open. Sign the petition to pardon Eric Snowden: https://petitions.whitehouse.gov/petition/pardon-edward-snowden/Dp03vGYD Join the call to free Bradley Manning.
ReplyDelete
Replies
Heather Morrison6/16/2013 3:09 PM
This Guardian article says what I just said above but much better - the whistleblowers are the next generation of American patriots http://www.guardian.co.uk/commentisfree/2013/jun/16/whistleblowers-new-generation-american-patriots
ReplyDelete
Replies
Mr. Gunn6/17/2013 11:36 AM
Part of me thinks Jonathan just did this to show he beats his brother, but to be serious for a moment, maybe we could just stop these silly arguments about which license is best for everyone, and let the humanities put whatever NC-SA-ND stuff they want on their stuff, as long as they don't confuse people and distinguish their now restrictive practices from fully open OA?
ReplyDelete
Replies
Heather Morrison6/17/2013 8:48 PM
Mr. Gunn, our interests in this matter are quite different. You are in the world of industry, benefiting financially from the gifts of others. My concern is building a global sustainable knowledge commons to serve the interests of scholarship and the public. If businesses can make a profit along the way, that's a good thing, but it's not the point and problematic when profit becomes the priority.
ReplyDelete
Replies
Anonymous6/27/2013 2:23 AM
Just to say that PLOS did re-release the Open Access Spectrum under a CC BY license which I agree was the right way to do it.
ReplyDelete
Replies

Add comment

The Tree of Life

Sunday, June 16, 2013

How Open Are You? Part 1: Metrics to Measure Openness and Free Availability of Publications

13 comments:

Most recent post

My Ode to Yolo Bypass