The data-index: An author-level metric that values impactful data and incentivizes data sharing

Amelia S C Hood¹, William J Sutherland^{1

2}

Affiliations

¹ Conservation Science Group, Department of Zoology University of Cambridge Cambridge UK.
² Biosecurity Research Initiative at St Catharine's (BioRISC), St Catharine's College University of Cambridge Cambridge UK.

PMID: 34765110
PMCID: PMC8571609
DOI: 10.1002/ece3.8126

The data-index: An author-level metric that values impactful data and incentivizes data sharing

Amelia S C Hood et al. Ecol Evol. 2021.

. 2021 Oct 13;11(21):14344-14350.

doi: 10.1002/ece3.8126. eCollection 2021 Nov.

Authors

Amelia S C Hood¹, William J Sutherland^{1

2}

Affiliations

¹ Conservation Science Group, Department of Zoology University of Cambridge Cambridge UK.
² Biosecurity Research Initiative at St Catharine's (BioRISC), St Catharine's College University of Cambridge Cambridge UK.

PMID: 34765110
PMCID: PMC8571609
DOI: 10.1002/ece3.8126

Abstract

Author-level metrics are a widely used measure of scientific success. The h-index and its variants measure publication output (number of publications) and research impact (number of citations). They are often used to influence decisions, such as allocating funding or jobs. Here, we argue that the emphasis on publication output and impact hinders scientific progress in the fields of ecology and evolution because it disincentivizes two fundamental practices: generating impactful (and therefore often long-term) datasets and sharing data. We describe a new author-level metric, the data-index, which values both dataset output (number of datasets) and impact (number of data-index citations), so promotes generating and sharing data as a result. We discuss how it could be implemented and provide user guidelines. The data-index is designed to complement other metrics of scientific success, as scientific contributions are diverse and our value system should reflect that both for the benefit of scientific progress and to create a value system that is more equitable, diverse, and inclusive. Future work should focus on promoting other scientific contributions, such as communicating science, informing policy, mentoring other scientists, and providing open-access code and tools.

Keywords: FAIR research data; author‐level metrics; bibliometrics; data citation; data metrics; data sharing; dataset repositories; h‐index; open science.

PubMed Disclaimer

Conflict of interest statement

None declared.

Figures

**FIGURE 1**
A composite figure with a hypothetical example that shows how the h‐index and data‐index for a data generator (i.e. someone who generates data, e.g. by conducting experiments) and data synthesist (i.e. someone who synthesizes research, e.g. through systematic reviews) at a similar career stage might differ. The h‐index is equal to the number of publications (*n_p* ) that have n _p or more citations, whereas the data‐index is equal to the number of datasets (*n_d* ) that have *n_d* or more data‐index citations. For both indices, publications or datasets are considered the same whether the author was primary author or coauthor. (a, b) Tables showing example data used to calculate the h‐index and data‐index shown in plots (c–f). (a, b) Papers with original data (highlighted in gray) are the only ones included in the calculation of the data‐index. Scatterplots with (c, d) publications ranked by citations to calculate the h‐index and (e, f) datasets ranked by data‐index citations to calculate the data‐index. Dashed lines show identity lines, and colored lines show the final publication/dataset used to calculate the index value, which is also colored. In this hypothetical example, the data generator has a lower h‐index (4) than the data synthesist (6), but a higher data‐index (6 vs. 2). *Cit*. is an abbreviation for citation

**FIGURE 2**
Diagrams showing that (a) data citations are calculated by summing the first‐level citations of a dataset, whereas (b, c) data‐index citations are calculated by summing the first‐level citations of a dataset or publication that contains an original dataset and any higher‐level citations of datasets or publications that have reused data from the original dataset or publication. (d, e) In cases where the same dataset has multiple identifiers (e.g. if the dataset has a unique identifier and a publication describing it has a different unique identifier), existing citation mapping software can be used to automatically group them and therefore avoid the same dataset being double‐counted; parallel lines show datasets and publications that are grouped. Abbreviations are as follows: *Datas*. = dataset, *Publ*. = publication, *Datas*./Publ. (reuse) = dataset or publication that has reused data from the original dataset. Arrows show the direction of citation, and numbers in black show the value this citation gives to calculating the citation score of the original dataset. White numbers in gray circles show the (a) data citation and (b–e) data‐index citation scores of the datasets beside them. Citation levels for (a–c) are shown on the left

See this image and copyright information in PMC

References

1. Arend, D. , König, P. , Junker, A. , Scholz, U. , & Lange, M. (2020). The on‐premise data sharing infrastructure e!DAL: Foster FAIR data for faster data acquisition. Gigascience, 9, 1–11. 10.1093/gigascience/giaa107 - DOI - PMC - PubMed
1. Barres, B. A. (2013). How to pick a graduate advisor. Neuron, 80, 275–279. 10.1016/j.neuron.2013.10.005 - DOI - PubMed
1. Bartneck, C. , & Kokkelmans, S. (2011). Detecting h‐index manipulation through self‐citation analysis. Scientometrics, 87, 85–98. 10.1007/s11192-010-0306-5 - DOI - PMC - PubMed
1. Barto, E. K. , & Rillig, M. C. (2012). Dissemination biases in ecology: Effect sizes matter more than quality. Oikos, 121, 228–235. 10.1111/j.1600-0706.2011.19401.x - DOI
1. Colavizza, G. , Hrynaszkiewicz, I. , Staden, I. , Whitaker, K. , & McGillivray, B. (2020). The citation advantage of linking publications to research data. PLoS One, 15, 1–18. 10.1371/journal.pone.0230416 - DOI - PMC - PubMed

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

The data-index: An author-level metric that values impactful data and incentivizes data sharing

Affiliations

The data-index: An author-level metric that values impactful data and incentivizes data sharing

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources