The distribution of citations among the scientific literature is in many respects similar to the distribution of wealth in the Western World: a handful of articles receive most of the citations, while most of the articles receive few or no citations at all. The distribution of citations is indeed highly skewed and not well represented by its average (the so-called “Bill Gates effect” in the wealth distribution analogy). In fact, when the average is computed for entities that publish a small number of articles, a few highly cited articles could be enough to propel these entities into research superstardom. In this post, we’ll look at an alternative metric Science-Metrix has developed to address the limitations of averages, as well as some of the other metrics we explored to get to that point.
The average (or more precisely in this case the arithmetic mean) remains the most well-known descriptive metric of a statistical distribution and is generally well understood. Science-Metrix has traditionally and consistently presented the average of relative citations (ARC) in its reports, pointing to the limits of the indicator when computed for small entities and avoiding the computation on small numbers of publications (i.e., < 30).
Conscious of the ARC’s limitation, Science-Metrix has also presented the highly cited publications (HCP) index. An indicator of impact/influence, often used as a proxy for excellence, the HCP is the proportion of an entity’s publications that figure among the most cited, either in the top 10%, 5% or 1%; all publications in the top X% are “worth” the same, as are all publications outside that group, regardless of the number of citations they received. Although the ARC and the HCP tend to be strongly correlated, the HCP is a binary-based indicator (either a specific article figures among the most cited, or it doesn’t) and, as such, doesn’t necessarily reflect on the overall performance of an entity; some articles have indeed been frequently cited, but not quite enough to make the top 10% or a higher level of performance. In this respect, while being a valuable indicator, the HCP does miss some of the detail within the distribution of citations, which makes it prone to its own shortcomings.
Percentile-based approaches are generally viewed as robust alternatives to average-based citation metrics. The median of relative citations was another metric that Science-Metrix explored. At first glance, it appears to be a simple and straightforward substitute to the ARC, but one is abruptly reminded of its limitations when the median of a distribution of citations is in fact 0 (a situation that occurs more often than one might realize), in which case the concept of “relative citations” is mathematically meaningless.
As a result, Science-Metrix has recently started to include in its reports a new indicator that seems to be immune to all the imperfections listed so far: the citation distribution index (CDI). The first step in calculating the CDI consists of distributing all articles in the given database, segregated by subfield, publication year and publication type, among 10 groups of equal size (i.e., 10 deciles) according to the number of citations that they received, with the least cited articles belonging to the 1st decile and the most cited articles belonging to the 10th decile. The 10th decile is in fact equivalent to the HCP10% discussed above.
Articles that have the same number of citations and that fall on the line between two deciles are fractioned across multiple deciles to guarantee that each decile contains precisely 10% of the distribution (e.g., each decile of a distribution of 113 publications would contain 11.3 publications). Once the global distribution has been performed, it is possible to examine how the papers of a given entity are distributed across citation deciles. All things being equal, one would expect each decile to contain 10% of an entity’s publications. In the real world, however, some entities perform better than others and find a larger-than-expected proportion of their publications in the upper echelons of the distribution. An incremental weight is applied to each decile before summing all 10 of the decile scores to produce the CDI. Science-Metrix uses a weight that ranges from -5 to -1 for the first five deciles and a weight that ranges from 1 to 5 for the last five deciles.
The citation distribution index can be conveniently visualized using a chart similar to the following figure.
The bold line represents the expected proportion of articles for each decile (i.e., 10%). Red bars indicate that fewer articles than expected are observed in a given decile, while green bars indicate the opposite. A realistic good-case scenario is illustrated in the above figure. Ideally, one would wish to be underrepresented among the weakest deciles (long red bars on the left) and overrepresented among the strongest deciles (long green bars on the right). To be underrepresented in the bottom deciles is certainly not a bad thing!
For more detail on the development and calculation of the CDI, please see Campbell et al.’s conference paper, “An approach for the condensed presentation of intuitive citation impact metrics which remain reliable with very few publications”: https://inis.iaea.org/search/search.aspx?orig_q=RN:48050915 (pp. 1229–1240). Note that in this paper, the citation distribution index is referred to by a former name, the relative integration score (RIS).