Databases never include everything. This is one of the most important reasons to know your data source. By knowing what is excluded, you can better understand the counts produced and arrive at a correct interpretation. The switch from SCI/SSCI CD-ROM to Scopus in NSF’s Science and Engineering Indicators (SEI) 2016 provides a rare opportunity to compare the analytical virtues of two databases head to head. Because we can bring in other system-level data—funding—at some level of disaggregation, we can investigate whether our representation of the science system improves. It appears that our picture of U.S. science has improved, and, in particular, computing and engineering are no longer discriminated against. Even beyond those two fields the representation of fields in publications seems better aligned with the picture obtained from funding data. The figure reports the “prices” of the papers produced by U.S. universities across fields of science as they would appear if papers were counted in SCI/SSCI CD-Rom version and in Scopus. (And if only money expended in 2012 under the research accounts of universities were used to produce each paper. And if each author spent the same fraction of the notional total paper budget, with equal amounts of that budget coming from any coauthors in other sectors or countries.)
Since 1972, the U.S. National Science Foundation (NSF) has produced a biennial report for presentation to the President and Congress entitled Science and Engineering Indicators (SEI). The latest version was released in January 2016. This report has always contained time series bibliometric data at the country and field level counted in the Science Citation Index (SCI), and in recent decades has also incorporated Social Sciences Citation Index (SSCI) data. As with all government data, consistent time series are important. Over the decades, this created tension in tracking the growth of scientific publication. In the early years, a “fixed journal set” was used, which meant that over a decade only journals indexed in the database at the beginning of the decade were included in the paper count. The idea was that the count would not be compromised by the database company adding a bunch of journals in a year. Though the consistency was admirable, it was attained only by suppressing the apparent growth rate of the scientific enterprise. The fixed journal set was dropped more than a decade ago, which meant that the journal accession policy of Thomson Reuters was baked into the apparent expansion of science and measured in SEI.
However, consistent data remained a concern and SEI did not follow the expansions of Thomson Reuters’ offerings—the Arts and Humanities Citation Index (AHCI), SCI Expanded, the Conference Proceedings Citation Index—which are included in the online Web of Science collection. Rather, SEI maintained its allegiance to the smaller dataset of the SCI and SSCI on CD-ROM. This changed with the 2016 edition in which a new contractor produced the data—Science Metrix through SRI International—and Scopus became the basis of the counts. The switch provides an opportunity to see the difference a database makes.
Although SEI says the switch provides access to a broader set of publications that provide insights into trends in emerging and developing countries, the switch also affords us a unique opportunity to compare the representation of the U.S. system. SEI contains other types of system-level data in addition to paper counts and comparing them allows us to see which paper counts align best with other measures of the size of the science system. Figure 1 uses funding data in comparison to paper counts. For universities, SEI breaks down R&D expenditure and paper counts into almost exactly the same field classification, enabling a comparison to be made. Both SEI 2014 (SCI&SSCI) and SEI 2016 (Scopus) report 2012 paper counts. “Higher education R&D expenditures, by R&D field” for 2012 is also reported in both 2014 and 2016. However, the numbers differ slightly, so in this comparison the 2014 version of funding data is used to allow us to focus on the changes in paper counts.
Overall, Scopus counts improve the representation of the U.S. science system in that the cost per paper is more equal across fields—that is, the variance is reduced—especially if agricultural sciences are ignored. Overall, Scopus indexes 88% more papers than SCI/SSCI, but only 46% more papers in agriculture. Agriculture is an outlier in the dollar per paper data. There could be several reasons the data for agriculture is off—for example, if in comparison to other fields Scopus were missing more of the agriculture literature, or if money counted as agriculture more often produced papers counted in another field, or if universities do more development work in agriculture, which is work that doesn’t result in papers. Computing and engineering are the biggest beneficiaries of the switch to Scopus. Instead of looking like two of the most expensive fields, with costs way out of line with other fields, the price of academic papers in computing is now in line with other non-biological, non-engineering fields, and the price of papers in engineering is similar to that of higher-cost fields like biological and environmental sciences. Scopus indexes 972% more computing papers than SCI/SSCI, largely because conference proceedings are included and conferences are more central to publishing in computing than in any other field. Scopus indexes 193% more engineering papers than SCI/SSCI. Incorporating Compendex ensures Scopus covers engineering, whereas SCI’s traditional origins meant it had historically been weak in areas other than basic laboratory sciences.
The switch from SCI/SSCI to Scopus in SEI 2016 provides a rare opportunity to compare the analytical virtues of two databases head to head. Because we can bring in other system-level data—funding—at some level of disaggregation, we can investigate whether our representation of the science system improves. It appears that our picture of U.S. science has improved, and, in particular, computing and engineering are no longer discriminated against. Even beyond those two fields, the representation of fields in publications seems better aligned with the picture obtained from funding data.
An exhaustive comparison of the Scopus and SCI/SSCI publication counts for SEI is available in:
Grégoire Côté, Guillaume Roberge, Éric Archambault (2016) Bibliometrics and Patent Indicators for the Science and Engineering Indicators 2016: Comparison of 2016 Bibliometric Indicators to 2014 Indicators, January, Science-Metrix: Montreal, Canada.
All views expressed are those of the individual author and are not necessarily those of Science-Metrix, 1science or Georgia Institute of Technology.