During our data mining project for the European Commission, one of the case studies we undertook to test our framework to guide data mining for policy research explored open access (OA) publications in the European context. Specifically, the question we aimed to tackle was whether institutional OA policies have an effect on the share of an institution’s papers available in OA, and if so, to what degree. An answer to this question would provide actionable advice for institutions that are looking to increase the availability of their research. Here’s what we found.
The evolving OA movement
First, let’s review some key elements of the OA discussion. There are several concerns about keeping publications behind costly paywalls.
- It steepens inequalities within the academic sphere between the wealthier institutions that can afford (some) expensive journal subscriptions and those that can’t.
- It creates barriers to the discovery of research within the research community, reducing the efficiency of reuse of existing work.
- It reinforces the gap between the research sphere and the rest of society.
Furthermore, there are rumours of a citation advantage for OA papers, where those citations are an important currency for professional capital in academia. And there are more and more directives every day about making publications supported by public funds available to their patrons.
In spite of these pressures driving change, we nonetheless haven’t spontaneously converted to a completely OA model of scholarly communication. Some pressures that resist a swifter conversion are the inertia of academic practice and the vested interests of academic publishers to protect—through lobbying and litigation—their very lucrative, pay-for-access business model.
Measuring policies that promote OA
Vincent-Lamarre and colleagues, in a paper published in 2016, dug into how effectively policies on open access actually increase the levels of OA for the institution’s publications; they uncovered some tangible recommendations about how to design the most effective OA policy. Their approach was to assess the share of an organization’s papers actually deposited in institutional repositories (using the Repository of Open Access Repositories [ROAR] database) and figure out whether having an OA policy in place (as catalogued in the ROAR Mandate and Policy registry [ROARMAP] database) had any effect on deposition rates. Furthermore, they examined which policy features (standardized, catalogued and scored in the MELIBEA database) had the greatest effects on levels of OA.
Their findings were that having an OA policy in place does indeed contribute to higher levels of OA, and that three policy design features in particular were correlated with greater increases still:
- the requirement that articles be deposited immediately upon acceptance for publication;
- the consideration during tenure and promotion reviews of only those articles that are deposited; and
- allowing researchers to unconditionally opt out of the OA component of the repository’s function.
The study authors elucidate these points nicely. Authors who don’t deposit their papers right away upon acceptance are more likely to forget to do so later; they’re more likely to comply if their professional advancement is dependent on deposition; and their copyright concerns around OA are alleviated if they feel they’re allowed to opt out of the OA condition—even if they ultimately don’t opt out very often.Mirroring a study design eases integration of novel findings with existing bodies of knowledge Click To Tweet
Refined study design, using big data
Some components of the study design miss the mark on important elements of the policy context. For instance, items in an institutional repository are not all necessarily available in OA (sometimes they are not available outside the institutional context) and many items available in OA are provided through different avenues (such as arXiv and other subject-specific repositories). This means that the measurement of repository deposition only tangentially tracks OA, and ultimately provides a fuzzy picture of lowered barriers to discovery and reuse of research.
Our own study design conserves some elements of the Vincent-Lamarre study—specifically, we stick with the ROARMAP and MELIBEA data sources, for information about OA policy—but we use the Web of Science (WoS) and 1science databases to achieve a more reliable indicator of institutional OA availability rates. The Web of Science is not an exhaustive database of scholarly publications, and the 1science database captures publications available through major OA repositories but misses some smaller venues. Nonetheless, this approach seems a more direct measure of OA levels, despite not covering the literature exhaustively.
Knowing that there are important differences across national contexts, we decided to compare institutions within national contexts, specifically focusing on the UK and Spain. The ROARMAP is a UK-based initiative, so coverage of OA policies in that country is relatively good. By contrast, far fewer Spanish institutions have policies in the database, which may reflect coverage issues or a lower tendency towards OA policies at the institutional level. Standardizing institutional affiliation information in the bibliometric data enabled us to retrieve publications in the WoS for each academic institution in the UK and in Spain, as the basis for assessing policy effects.OA availability for UK institutions with OA policy vs. without: 50.8% vs. 43.5% Click To Tweet
Our results confirmed the broadest findings of Vincent-Lamarre: for UK universities with an OA policy in place, the average share of papers available in OA was seven percentage points higher than the group that had no OA policies in place—50.8% vs. 43.5%. The difference between median shares was even greater, at 10 percentage points—53.7% vs. 43.1%. OA levels were higher still for organizations that had already had an OA policy in place for 3 years or more. Our analysis showed the following results for UK universities (using a Mann–Whitney U test):
For Spanish institutions, too few institutions appeared in ROARMAP for any statistically significant differences to be measured. Furthermore, we explored various policy design features—as Vincent-Lamarre and colleagues did—but were not able to find any features that had a statistically significant influence on OA levels.
What we found
The policy implication of our pilot study is that institutional OA policies are a viable option for increasing shares of publications available in OA. Furthermore, our findings suggest that getting a policy in place and implemented early on has a significant influence on OA levels; accordingly, it’s important to just get the policy consultation process underway within an institution, without worrying unduly about how to craft the absolutely perfect policy before implementation.Institutional OA policies: a viable option for increasing shares of publications available in OA Click To Tweet
The policy design proposed by Vincent-Lamarre and colleagues is endorsed here as a great starting point—they were able to show positive outcomes of certain design elements, and nothing in our study disproves their conclusion (though we weren’t able to reproduce their findings using a somewhat similar study design and different data sources). Accordingly, we carry forward their proposal as a starting point, and add to it the encouragement to just get working on an OA policy as soon as possible.
There were also some data mining lessons that emerged from this case study. Peer-reviewed papers can provide valuable study designs and data sources for inspiration in building a new study that draws on big data sources. We knew already that crossing novel with established data sources is helpful in providing a foundation from which to understand the novel source; we learned that mirroring the design of a previous study is also helpful in this vein, to guide data analysis as we grapple with novel data sources. Furthermore, we learned that mirroring a study design eases the integration of novel findings with the existing body of knowledge. This case study also underlined the value of policy repositories as a data source for studies, to help identify the presence/absence of policies and provide some structured information about their various designs. Lastly, the case study highlighted the importance of collecting baseline measures against which to compare future results in order to assess impact.
Science-Metrix’s final report for this data mining project is available from the Publications Office of the European Union.
Data Mining. Knowledge and technology flows in priority domains within the private sector and between the public and private sectors. (2017). Prepared by Science-Metrix for the European Commission. ISBN 978-92-79-68029-8; DOI 10.2777/089
All views expressed are those of the individual author and are not necessarily those of Science-Metrix or 1science.