Rationalizing the extremes: introducing the citation distribution index
May 10, 2018
, , ,

The distribution of citations among the scientific literature is in many respects similar to the distribution of wealth in the Western World: a handful of articles receive most of the citations, while most of the articles receive few or no citations at all. The distribution of citations is indeed highly skewed and not well represented by its average (the so-called “Bill Gates effect” in the wealth distribution analogy). In fact, when the average is computed for entities that publish a small number of articles, a few highly cited articles could be enough to propel these entities into research superstardom. In this post, we’ll look at an alternative metric Science-Metrix has developed to address the limitations of averages, as well as some of the other metrics we explored to get to that point.

The average (or more precisely in this case the arithmetic mean) remains the most well-known descriptive metric of a statistical distribution and is generally well understood. Science-Metrix has traditionally and consistently presented the average of relative citations (ARC) in its reports, pointing to the limits of the indicator when computed for small entities and avoiding the computation on small numbers of publications (i.e., < 30).

Conscious of the ARC’s limitation, Science-Metrix has also presented the highly cited publications (HCP) index. An indicator of impact/influence, often used as a proxy for excellence, the HCP is the proportion of an entity’s publications that figure among the most cited, either in the top 10%, 5% or 1%; all publications in the top X% are “worth” the same, as are all publications outside that group, regardless of the number of citations they received. Although the ARC and the HCP tend to be strongly correlated, the HCP is a binary-based indicator (either a specific article figures among the most cited, or it doesn’t) and, as such, doesn’t necessarily reflect on the overall performance of an entity; some articles have indeed been frequently cited, but not quite enough to make the top 10% or a higher level of performance. In this respect, while being a valuable indicator, the HCP does miss some of the detail within the distribution of citations, which makes it prone to its own shortcomings.

Percentile-based approaches are generally viewed as robust alternatives to average-based citation metrics. The median of relative citations was another metric that Science-Metrix explored. At first glance, it appears to be a simple and straightforward substitute to the ARC, but one is abruptly reminded of its limitations when the median of a distribution of citations is in fact 0 (a situation that occurs more often than one might realize), in which case the concept of “relative citations” is mathematically meaningless.

As a result, Science-Metrix has recently started to include in its reports a new indicator that seems to be immune to all the imperfections listed so far: the citation distribution index (CDI). The first step in calculating the CDI consists of distributing all articles in the given database, segregated by subfield, publication year and publication type, among 10 groups of equal size (i.e., 10 deciles) according to the number of citations that they received, with the least cited articles belonging to the 1st decile and the most cited articles belonging to the 10th decile. The 10th decile is in fact equivalent to the HCP10% discussed above.

Articles that have the same number of citations and that fall on the line between two deciles are fractioned across multiple deciles to guarantee that each decile contains precisely 10% of the distribution (e.g., each decile of a distribution of 113 publications would contain 11.3 publications). Once the global distribution has been performed, it is possible to examine how the papers of a given entity are distributed across citation deciles. All things being equal, one would expect each decile to contain 10% of an entity’s publications. In the real world, however, some entities perform better than others and find a larger-than-expected proportion of their publications in the upper echelons of the distribution. An incremental weight is applied to each decile before summing all 10 of the decile scores to produce the CDI. Science-Metrix uses a weight that ranges from -5 to -1 for the first five deciles and a weight that ranges from 1 to 5 for the last five deciles.

The citation distribution index can be conveniently visualized using a chart similar to the following figure.

citation distribution index, CDI, Science-Metrix, indicator development, bibliometrics, impact indicators

The bold line represents the expected proportion of articles for each decile (i.e., 10%). Red bars indicate that fewer articles than expected are observed in a given decile, while green bars indicate the opposite. A realistic good-case scenario is illustrated in the above figure. Ideally, one would wish to be underrepresented among the weakest deciles (long red bars on the left) and overrepresented among the strongest deciles (long green bars on the right). To be underrepresented in the bottom deciles is certainly not a bad thing!

For more detail on the development and calculation of the CDI, please see Campbell et al.’s conference paper, “An approach for the condensed presentation of intuitive citation impact metrics which remain reliable with very few publications”: (pp. 1229–1240). Note that in this paper, the citation distribution index is referred to by a former name, the relative integration score (RIS).


About the author

Simon Provencal and David Campbell

Dr. Simon Provençal joined Science-Metrix in early 2016. As a research analyst, he contributes his strong programming, analytical and bilingual writing skills to the Bibliometrics team’s projects. As a scholar and a professional, Simon has accumulated over seven years of experience in the analysis of quantitative data in the context of scientific research and performance evaluation. He has also led several short- and long-term research projects in collaboration with institutions of local and international reach, such as Quebec’s Ministry of the Environment and NASA. His research outcomes have been published in scientific articles, technical reports and academic documents. *** David Campbell is the Chief Scientist at Science-Metrix, where he uses his quantitative and data mining expertise to develop and refine bibliometric and other S&T indicators, as well as original analysis methods. Since 2004, David has contributed to approximately 100 studies for Canadian clients, including most national research councils and federal departments. He has also contributed to studies for international organizations such as the European Commission and the US National Science Foundation. In 2015 and 2016, David led the two-year development of a methodological framework to guide the design and implementation of data mining projects aimed at supporting the formulation, implementation and evaluation of R&I policies. He also led the design of five new indicators examining the gender dimension in science for the European Commission’s She Figures 2015. He regularly presents at conferences and has published several peer-reviewed papers.

Related items

/ You may check this items as well

1findr: discovery for the world of research

As of last week, 1science is offering public acces...

Read more

Positional analysis: from boring tables to sweet visuals

At Science-Metrix we are obviously very focused on...

Read more

Mapping science: a guide to our Twitter series

Over the course of 2018, we’ll be publishing a s...

Read more

There are 0 comments