Diana Hicks & Cassidy Sugimoto
In August 2013, President Obama released the “Plan to Make College More Affordable: A Better Bargain for the Middle Class,” which announced the Administration’s intention to develop a rating system for colleges, with results eventually to be linked to federal student aid. Public comments were solicited, and the Association of American Universities (AAU) was among the many who submitted comments. The AAU objected to the proposed college rating system.
In the meantime, faculty at AAU universities have noted a fondness on the part of their provosts for the rating system sold by Academic Analytics. Noting the parallels between the two issues, we hereby submit a reworking of the first AAU public comment submission to the Department of Education responding to the Administration’s call for a college rating system, a letter from Hunter R. Rawlings III, President of the Association of American Universities, to Jamie Studley, Deputy Under Secretary, Department of Education dated December 2, 2013. (Unfortunately, the Department of Education has removed the public comments from its site, so the original letter is no longer available. We can provide it upon request.)
We would like to respond to the growing trend of institutions purchasing and using individual faculty performance tracking tools, such as Academic Analytics. We applaud our provosts’ focus on research excellence at a time when Americans are concerned with staying scientifically competitive and addressing global grand challenges. AAU member institutions strive to provide the highest quality research possible and to make it accessible through a range of dissemination measures. We are encouraged by many of the goals outlined in the strategic plans of many AAU provosts, which demonstrate a need for higher research performance. This response is focused on the metrics for research evaluation, and the tools purchased to generate these.
We do not endorse these tools, but we would like to share our views. As articulated in prior statements about college ratings systems, we believe that any tools designed to be useful to administrators and science policymakers should be grounded in data that are reliable and valid, and that are presented with the appropriate context to accurately reflect institutional performance. In preparing this response, we considered the reliability and validity, as well as methodological limitations, of possible metrics related to research excellence. The sheer volume of literature on this topic from experts around the world speaks to the inherent complications and controversies for this—or any—rating system. We have summarized the concerns below and divided the concerns into (1) general feedback on rating systems, (2) data sources and definitional issues, and (3) peer groups. We have also provided specific feedback for one particular tool—that is, Academic Analytics—and the degree to which this tool fails to meet the principles outlined in the Leiden Manifesto, an internationally influential statement of best practice in metrics for research evaluation published in Nature last year.
Limitations of Rating Systems
Despite the proliferation of college rating and ranking systems over the last several years, developing a valid and useful rating system is a daunting endeavor. Given the diversity of higher educational institutions (including their varying sizes, locations, and missions), the shortcomings of available data, and the breadth of institutional objectives, the exercise becomes even more controversial and complex. While a great deal of information is available about institutions drawn from a variety of sources, combining a particular set of measures into a single rating raises important questions: What methodology will be used to generate the ratings? What measures will be used? How will different measures be combined and weighted? What are the appropriate peer groups for comparison? Do these variables (methodology, measures, weights, peer groups) vary for different objectives? Depending on the answers to these questions, a system may be biased against some departments and toward others, leading to unintended consequences for faculty and institutions.
One central dilemma, especially if such a rating system is intended for chairs and other administrators, is that many of the available measures and metrics require additional context to be useful. Yet providing more context for individual measures seems in opposition to the approach of combining measures into a single metric. Different types of fields have different scholarly output and may require different metrics to capture their strengths and weaknesses. A one-size-fits-all approach fails to capture the diversity of the scholarly communication system.
A further difficulty is that a rating system that looks at research outcomes will of necessity consider both the inputs and outputs of scholarship. Yet these are not independent of one another: designing a rating system that accounts for these relationships in a way that does not provide incentives to faculty to engage in undesirable behaviors will be a major challenge.
Data Sources and Definitional Issues
Even if a perfect methodology for developing indicators existed, it would be meaningless without good data. Indicators today rely on many extant data sources, primarily Thomson Reuters’ Web of Science, Elsevier’s Scopus, and publicly available grant databases. While sources such as the Web of Science were initially developed to facilitate collection development in libraries, over the years they have become a tool for measuring individual faculty productivity and impact, with mixed results. Fundamental limitations of using these data at the individual or field level include disciplinary and language biases in coverage, as well as lack of representativeness in the scholarly products covered.
A rating system must be equipped to deal with missing or limited data, and the use of inappropriate denominators to calculate ratios and percentages is a concern in the construction of many metrics, such as the Journal Impact Factor. Overall, while bibliometric data can be informative for administrators, they may not provide the whole picture, and require contextualization to be most useful. Bibliometric data are inappropriate as the basis for a system that will ultimately make determinations about the allocation of resources and rewards to individual faculty members and departments.
Vague definitions of data specifications or policy terms can also be a problem. Often, it is the availability of extant data that determines the definition of policy terms such as productivity and impact, rather than stakeholders first agreeing on definitions, with the search for the right data following suit. Such an approach may lead to measuring what’s available, rather than what stakeholders truly care about.
Regardless of the final data sources for a system, provosts should ensure that all data are publicly available and made available to faculty and departments for review prior to distribution and use.
There is much concern among faculty about how peer groups are defined in indicators. Higher education institutions are nearly as diverse as they are numerous, and systems to categorize institutions inevitably have shortcomings. While groupings like the Carnegie Classification capture many of an institution’s core characteristics, no classification system captures features such as whether a department emphasizes certain subdisciplinary areas, or promotes different forms of scholarly output and creative activity.
A related issue involves how performance within a rating system will be measured within peer groups: Will faculty be compared to one another, to themselves over time, or to an external benchmark of some kind? We caution that comparisons within any peer group necessarily lead to “winners” and “losers.” Such an approach seems inconsistent with the goals of encouraging all faculty to innovate and improve their performance and can lead to goal displacement activities.
Beyond the more general issues outlined above, we wish to provide comments on a specific tool currently in use, which has been purchased by several AAU institutions. We will describe the ways in which this tool—Academic Analytics—violates the principles outlined in the Leiden Manifesto. This tool serves as an example of the types of products that we oppose.
Quantitative evaluation should support qualitative, expert assessment. Academic Analytics is, at present, a purely quantitative evaluation metric that includes no expert assessment. Furthermore, the reductionist approach to visualizing the data in a spider diagram—the hallmark of Academic Analytics—tempts administrators to substitute indicators for informed judgement.
Measure performance against the research missions of the institution, group or researcher. Academic Analytics, through the peer comparison metrics, falsely assumes a homogeneity among research institutions and departments. This is despite the substantial evidence that performance metrics vary widely by subfield and topical foci. Furthermore, it assumes that excellence is achieved through competition with peer groups, rather than with alignment with the institutional objectives and mission.
Protect excellence in locally relevant research. Many AAU institutions are state-funded institutions with a responsibility to outreach and service-oriented research in the local community. The output and impact of this research may not be covered in standard bibliometric databases. Therefore, tools that rely on such data will inevitably discount these activities. Furthermore, the ubiquitous spider diagram implicitly reinforces the idea that profiles should be similar among these institutions.
Keep data collection and analytical processes open, transparent and simple. The lack of transparency with Academic Analytics’ data is perhaps the most fundamental shortcoming. Academic Analytics does not make public the data used or the mechanisms for generating these data. Without this information, proper validation is impossible. That Academic Analytics treat data as top secret was confirmed when their license was exposed in a dispute with the Rutgers faculty union. Academic Analytics’ license restricts data access to those who “hold a position that involves internal strategic decision making and evaluation of productivity” and have been approved (and only allows for a “reasonable” number of approved users). This means that Academic Analytics’ data about faculty is not made available to all faculty members. This is antithetical to the values of many AAU institutions, particularly public institutions, which allow faculty to access the evaluations made of them. Each faculty member should have access to their data and opportunities for redress.
Allow those evaluated to verify data and analysis. Academics who have forced their way into Academic Analytics’ data trove have found it to be inaccurate, perhaps explaining the aforementioned secrecy. Through a New Jersey Open Public Records Act request, the President of the Rutgers faculty union, anthropologist David M Hughes, obtained his data. His/AA’s records indicate: 1/3 articles, 2/2 books, 100s/0 citations, $37,500/$0 grants. As the Leiden Manifesto stated: “Accurate, high-quality data take time and money to collate and process.” Enabling those being evaluated to verify data accuracy is essential.
Account for variation by field in publication and citation practices. Considerable research has gone into field normalization for bibliometric measures. The degree to which Academic Analytics complies with these techniques is unknown, given the lack of transparency around their algorithms.
Base assessment of individual researchers on a qualitative judgement of their portfolio. Academic Analytics is inherently reductionist and quantitative, reducing departments and individuals to a spider diagram.
Avoid misplaced concreteness and false precision. Academic Analytics’ numerical accounts of productivity and impact should perhaps be accompanied by estimation of the false positives and false negatives in the data, given known problems with its data accuracy.
Recognize the systemic effects of assessment and indicators. As stated in the Leiden Manifesto: “Indicators change the system through the incentives they establish.” By stressing peer comparison, Academic Analytics incentivizes publication in the ways that are measured in the system and disincentivizes activities that do not directly contribute to their indicators. This can lead to serious goal displacement, as faculty work to maximize Academic Analytics’ metrics, rather than institutional or disciplinary objectives. Thereby, provosts cede control over research strategies to companies like Academic Analytics.
Scrutinize indicators regularly and update them. The lack of transparency in Academic Analytics makes validation efforts nearly impossible. This is a serious shortcoming of the tool and the contracts AAU institutions have established with this company.
Finally, it goes without saying that the value of a research enterprise goes beyond maximizing papers and citations. Yet a ratings approach risks quantifying a complex variable like research quality in a way that fails to provide valuable information to administrators, while simultaneously ignoring and excluding the non-quantifiable ways research contributes toward our nation’s economic and societal goals. A primary focus on papers as a measure of institutional performance also exacerbates the recent trend of focusing almost exclusively on the numbers. It unwisely diminishes the social value that a faculty member provides to his or her community and our nation.
Thank you in advance for your consideration of AAU’s faculty views. We look forward to a conversation in the coming months about how we can enhance research excellence while protecting the integrity of the higher education enterprise.