For many many years I have been raising a key questions in relation to open access publishing - how can we measure how open someone's publications are. Ideally we would have a way of measuring this in some sort of index. A few years ago I looked around and asked around and did not find anything out there of obvious direct relevance to what I wanted so I started mapping out ways to do this.
When Aaron Swartz died I started drafting some ideas on this topic. Here is what I wrote (in January 2013) but never posted:
With the death of Aaron Swartz on Friday there has been much talk of people posting their articles online (a short term solution) and moving more towards openaccess publishing (a long term solution). One key component of the move to more openaccess publishing will be assessing people on just how good a job they are doing of sharing their academic work.
I have looked around the interwebs to see if there is some existing metric for this and I could not find one. So I have decided to develop one - which I call the Swartz Openness Index (SOI).
Let A = # of objects being assessed (could be publications, data sets, software, or all of these together).
Let B = # of objects that are released to the commons with a broad, open license.
A simple (and simplistic) metric could be simply
OI = B / A
This is a decent start but misses out on the degree of openness of different objects. So a more useful metric might be the one below.
A and B as above.
Let C = # of objects available free of charge but not openly
OI = ( B + (C/D) ) / A
where D is the "penalty" for making material in C not openly available
This still seems not detailed enough. A more detailed approach might be to weight diverse aspects of the openness of the objects. Consider for example the "Open Access Spectrum." This has divided objects (publications in this case) into six categories in terms of potential openness: reader rights, reuse rights, copyrights, author posting rights, automatic posting, and machine readability. And each of these is given different categories that assess the level of openness. Seems like a useful parsing in ways. Alas, since bizarrely the OAS is released under a somewhat restrictive CC BY-NC-ND license I cannot technically make derivatives of it. So I will not. Mostly because I am pissed at PLoS and SPARC for releasing something in this way. Inane.
But I can make my own openness spectrum.
And then I stopped writing because I was so pissed off at PLOS and SPARC for making something like this and then restricting it's use. I had a heated discussion with people from PLOS and SPARC about this but not sure if they updated their policy. Regardless, the concept of an Openness Index of some kind fell out of my head after this buzzkill. And it only just now came back to me. (Though I note - I did not find the Draft post I made until AFTER I wrote the rest of this post below ... ).
To get some measure of openness in publications maybe a simple metric would be useful. Something like the following
- P = # of publications
- A = # of fully open access papers
- OI = Openness index
A simple OI would be
However, one might want to account for relative levels of openness in this metric. For example
- AR = # of papers with a open but somewhat restricted license
- F = # of papers that are freely available but not with an open license
- C = some measure of how cheap the non freely available papers are
And so on.
Given that I am not into library science myself and not really familiar with playing around with this type of data I thought a much simpler metric would be to just go to Pubmed (which of course works only for publications in the arenas covered by Pubmed).
From Pubmed one can pull out some simple data.
- # of publications (for a person or Institution)
- # of those publications in PubMed Central (a measure of free availability)
Thus one could easily measure the "Pubmed Central" index as
PMCI = 100 * (# publications in PMC / # of publications in Pubmed)
Some examples of the PMCI for various authors including some bigger names in my field, and some people I have worked with.
Name |
#s | PMCI |
Eisen JA
|
224/269
|
83.2
|
Eisen MB
|
76/104
|
73.1
|
Collins FS |
192/521
|
36.8
|
Lander ES
|
160/377
|
42.4
|
Lipman DJ
|
58/73
|
79.4
|
Nussinov R
|
170/462
|
36.7
|
Mardis E
|
127/187
|
67.9
|
Colwell RR
|
237/435
|
54.5
|
Varmus H
|
165/408
|
40.4
|
Brown PO
|
164/234
|
70.1
|
Darling AE
|
20/27
|
74.0
|
Coop G
|
23/39
|
59.0
|
Salzberg SL
|
107/162
|
61.7
|
Venter JC
|
53/237
|
22.4
|
Ward NL
|
24/58
|
41.4
|
Fraser CM
|
78/262
|
29.8
|
Quackenbush J
|
95/225
|
42.2
|
Ghedin E
|
47/82
|
57.3
|
Langille MG
|
10/14
|
71.4
|
|
|
|
And so on. Obviously this is of limited value / accuracy in many ways. Many papers are freely available but not in Pubmed Central. Many papers are not covered by Pubmed or Pubmed Central. Times change, so some measure of recent publications might be better than measuring all publications. Author identification is challenging (until systems like ORCID get more use). And so on.
Another thing one can do with Pubmed is to identify papers with free full text available somewhere (not just in PMC). This can be useful for cases where material is not put into PMC for some reason. And then with a similar search one can narrow this to just the last five years. As openaccess has become more common maybe some people have shifted to it more and more over time (I have -- so this search should give me a better index).
Lets call the % of publications with free full text somewhere the "Free Index" or FI. Here are the values for the same authors.
Name
|
PMC
%
Pudmed
|
PMCI
|
Free
%
Pubmed
5 years
|
FI - 5
|
Free
%
Pubmed
All
|
FI-ALL
|
Eisen JA
|
224/269
|
83.2
|
178/180
| 98.9 |
237
|
88.1
|
Eisen MB
|
76/104
|
73.1
|
32/34
|
94.1
| 83 | 79.8 |
Collins FS |
192/521
|
36.8
|
104/128
|
81.3
| 263 | 50.5 |
Lander ES
|
160/377
|
42.4
|
78/104
|
75.0
| 200 | 53.1 |
Lipman DJ
|
58/73
|
79.4
|
20/22
|
90.9
| 59 | 80.8 |
Mardis E
|
127/187
|
67.9
|
90/115
|
78.3
| 135 | 72.2 |
Colwell RR
|
237/435
|
54.5
|
31/63
|
49.2
| 258 | 59.3 |
Varmus H
|
165/408
|
40.4
|
21/28
|
75.0
| 206 | 50.5 |
Brown PO
|
164/234
|
70.1
|
20/21
|
95.2
| 185 | 79.0 |
Darling AE
|
20/27
|
74.0
|
18/21
|
85.7
| 21 | 77.8 |
Coop G
|
23/39
|
59.0
|
16/20
|
80.0
| 28 | 71.8 |
Salzberg SL
|
107/162
|
61.7
|
54/58
|
93.1
| 128 | 79.0 |
Venter JC
|
53/237
|
22.4
|
20/33
|
60.6
| 85 | 35.9 |
Ward NL
|
24/58
|
41.4
|
18/27
|
66.6
| 30 | 51.7 |
Fraser CM
|
78/262
|
29.8
|
9/13
|
69.2
| 109 | 41.6 |
Quackenbush J
|
95/225
|
42.2
|
54/75
|
72.0
| 131 | 58.2 |
Ghedin E
|
47/82
|
57.3
|
30/36
|
83.3
| 56 | 68.3 |
Langille MG
|
10/14
|
71.4
|
11/13
|
84.6
| 11 | 78.6 |
Very happy to see that I score very well for the last five years. 180 papers in Pubmed. 178 of them with free full text somewhere that Pubmed recognizes. The large number of publications comes mostly from genome reports in the open access journals Standards in Genomic Sciences and Genome Announcements. But most of my non genome report papers are also freely available.
I think in general it would be very useful to have measures of the degree of openness. And such metrics should take into account sharing of other material like data, methods, etc. In a way this could be a form of the altmetric calculations going on.
But before going any further I decided to look again into what has been done in this area. When I first thought of doing this a few years ago I searched and asked around and did not see much of anything. (Although I do remember someone out there - maybe Carl Bergstrom - saying there were some metrics that might be relevant - but can't figure out who / what this information in the back of my head is).
So I decided to do some searching anew. And lo and behold there was something directly relevant. There is a
paper in the Journal of Librarianship and Scholarly Communication called: The Accessibility Quotient: A New Measure of Open Access. By Mathew A. Willmott, Katharine H. Dunn, and Ellen Finnie Duranceau from MIT.
Full Citation: Willmott, MA, Dunn, KH, Duranceau, EF. (2012). The Accessibility Quotient: A New Measure of Open Access. Journal of Librarianship and Scholarly Communication 1(1):eP1025. http://dx.doi.org/10.7710/2162-3309.1025
Here is the abstract:
Abstract
INTRODUCTION The Accessibility Quotient (AQ), a new measure for assisting authors and librarians in assessing and characterizing the degree of accessibility for a group of papers, is proposed and described. The AQ offers a concise measure that assesses the accessibility of peer-reviewed research produced by an individual or group, by incorporating data on open availability to readers worldwide, the degree of financial barrier to access, and journal quality. The paper reports on the context for developing this measure, how the AQ is calculated, how it can be used in faculty outreach, and why it is a useful lens to use in assessing progress towards more open access to research.
METHODS Journal articles published in 2009 and 2010 by faculty members from one department in each of MIT’s five schools were examined. The AQ was calculated using economist Ted Bergstrom’s Relative Price Index to assess affordability and quality, and data from SHERPA/RoMEO to assess the right to share the peer-reviewed version of an article.
RESULTS The results show that 2009 and 2010 publications by the Media Lab and Physics have the potential to be more open than those of Sloan (Management), Mechanical Engineering, and Linguistics & Philosophy.
DISCUSSION Appropriate interpretation and applications of the AQ are discussed and some limitations of the measure are examined, with suggestions for future studies which may improve the accuracy and relevance of the AQ.
CONCLUSION The AQ offers a concise assessment of accessibility for authors, departments, disciplines, or universities who wish to characterize or understand the degree of access to their research output, capturing additional dimensions of accessibility that matter to faculty.
I completely love it. After all. it is directly related to what I have been thinking about and, well, they actually did some systematic analysis of their metrics. I hope more things like this come out and are readily available for anyone to calculate. Just how open someone is could be yet another metric used to evaluate them ...
And then I did a little more searching and found the following which also seem directly relevant
So - it is good to see various people working on such metrics. And I hope there are more and more.
Anyway - I know this is a bit incomplete but I simply do not have time right now to turn this into a full study or paper and I wanted to get these ideas out there. I hope someone finds them useful ...