Identifiers for the 21st century: How to design, provision, and reuse persistent identifiers to maximize utility and impact of life science data
- PMID: 28662064
- PMCID: PMC5490878
- DOI: 10.1371/journal.pbio.2001414
Identifiers for the 21st century: How to design, provision, and reuse persistent identifiers to maximize utility and impact of life science data
Abstract
In many disciplines, data are highly decentralized across thousands of online databases (repositories, registries, and knowledgebases). Wringing value from such databases depends on the discipline of data science and on the humble bricks and mortar that make integration possible; identifiers are a core component of this integration infrastructure. Drawing on our experience and on work by other groups, we outline 10 lessons we have learned about the identifier qualities and best practices that facilitate large-scale data integration. Specifically, we propose actions that identifier practitioners (database providers) should take in the design, provision and reuse of identifiers. We also outline the important considerations for those referencing identifiers in various circumstances, including by authors and data generators. While the importance and relevance of each lesson will vary by context, there is a need for increased awareness about how to avoid and manage common identifier problems, especially those related to persistence and web-accessibility/resolvability. We focus strongly on web-based identifiers in the life sciences; however, the principles are broadly relevant to other disciplines.
Conflict of interest statement
The authors have declared that no competing interests exist.
Figures
References
-
- Pitcher L. Writing Ancient History: An Introduction to Classical Historiography [Internet]. I.B.Tauris; 2010. Available: https://play.google.com/store/books/details?id=A4YAAwAAQBAJ
-
- Sanderson R, Phillips M, Van de Sompel H. Analyzing the Persistence of Referenced Web Resources with Memento [Internet]. arXiv [cs.DL]. 2011. Available: http://arxiv.org/abs/1105.3459
-
- Pepe A, Goodman A, Muench A, Crosas M, Erdmann C. How do astronomers share data? Reliability and persistence of datasets linked in AAS publications and a qualitative study of data practices among US astronomers. PLoS One. 2014;9: e104798 doi: 10.1371/journal.pone.0104798 - DOI - PMC - PubMed
-
- Klein M, Van de Sompel H, Sanderson R, Shankar H, Balakireva L, Zhou K, et al. Scholarly context not found: one in five articles suffers from reference rot. PLoS One. 2014;9: e115253 doi: 10.1371/journal.pone.0115253 - DOI - PMC - PubMed
-
- Bugeja MJ, Dimitrova DV. Vanishing Act: The Erosion of Online Footnotes and Implications for Scholarship in the Digital Age [Internet]. 2010. Available: https://journals.ala.org/rusq/article/viewFile/3569/3871
MeSH terms
Grants and funding
- BB/M017702/1/BB_/Biotechnology and Biological Sciences Research Council/United Kingdom
- U24 DK097771/DK/NIDDK NIH HHS/United States
- U24 AI117966/AI/NIAID NIH HHS/United States
- BB/K019783/1/BB_/Biotechnology and Biological Sciences Research Council/United Kingdom
- BBS/E/B/000C0419/BB_/Biotechnology and Biological Sciences Research Council/United Kingdom
- U54 AI117925/AI/NIAID NIH HHS/United States
- U24 DA039832/DA/NIDA NIH HHS/United States
- R24 OD011883/OD/NIH HHS/United States
- BB/E006248/1/BB_/Biotechnology and Biological Sciences Research Council/United Kingdom
- P41 HG002273/HG/NHGRI NIH HHS/United States
- BB/E025080/1/BB_/Biotechnology and Biological Sciences Research Council/United Kingdom
- U41 HG007822/HG/NHGRI NIH HHS/United States
LinkOut - more resources
Full Text Sources
Other Literature Sources
