Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Jun 29;15(6):e2001414.
doi: 10.1371/journal.pbio.2001414. eCollection 2017 Jun.

Identifiers for the 21st century: How to design, provision, and reuse persistent identifiers to maximize utility and impact of life science data

Affiliations

Identifiers for the 21st century: How to design, provision, and reuse persistent identifiers to maximize utility and impact of life science data

Julie A McMurry et al. PLoS Biol. .

Abstract

In many disciplines, data are highly decentralized across thousands of online databases (repositories, registries, and knowledgebases). Wringing value from such databases depends on the discipline of data science and on the humble bricks and mortar that make integration possible; identifiers are a core component of this integration infrastructure. Drawing on our experience and on work by other groups, we outline 10 lessons we have learned about the identifier qualities and best practices that facilitate large-scale data integration. Specifically, we propose actions that identifier practitioners (database providers) should take in the design, provision and reuse of identifiers. We also outline the important considerations for those referencing identifiers in various circumstances, including by authors and data generators. While the importance and relevance of each lesson will vary by context, there is a need for increased awareness about how to avoid and manage common identifier problems, especially those related to persistence and web-accessibility/resolvability. We focus strongly on web-based identifiers in the life sciences; however, the principles are broadly relevant to other disciplines.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Anatomy of a web-based identifier.
An example of an exemplary unique resource identifier (URI) is below; it is comprised of American Standard Code for Information Interchange (ASCII) characters and follows a pattern that starts with a fixed set of characters (URI pattern). That URI pattern is followed by a local identifier (local ID)—an identifier which, by itself, is only guaranteed to be locally unique within the database or source. A local ID is sometimes referred to as an “accession.” Note this figure illustrates the simplest representation; nuances regarding versioning are covered in Lesson 6 and Fig 5.
Fig 2
Fig 2. A summary of the 10 recommendations and their direct or indirect impact on different kinds of identifier roles.
Fig 3
Fig 3. Contributions and roles related to content as they correspond to identifier creation versus identifier reuse.
The decision about whether to create a new identifier or reuse an existing one depends on the role you play in the creation, editing, and republishing of content; for certain roles (and when several roles apply) that decision is a judgement call. Asterisks convey cases in which the best course of action is often to correct/improve the original record in collaboration with the original source; the guidance about identifier creation versus reuse is meant to apply only when such collaboration is not practicable (and an alternate record is created). It is common that a given actor may have multiple roles along this spectrum; for instance, a given record in monarchinitiative.org may reflect a combination of (a) corrections Monarch staff made in collaboration with the original data source, (b) post-ingest curation by Monarch staff, (c) expanded content integrated from multiple sources.
Fig 4
Fig 4. Examples of provisioning resolvable Unique Resource Identifiers (URIs).
Compact URIs (CURIEs; Panel A), URIs (Panel B), and Access URLs (Panel C) with no redirection (the Zebrafish Identification Network [ZFIN]), in house redirection (UniProt and Ensembl), and third party resolvers (using identifiers.org and digital object identifiers [DOI]). In each case, the URI can be algorithmically derived from the CURIE because the local identifier (local ID) portion itself is included (unmodified) within the URI. Access URL design patterns differ substantially by provider and may change over time. As long as access URLs (and other ephemeral links) are not used as the referenced identifier, they can include prefix and colon (Mouse Genome Informatics [MGI]) or not (Ensembl), they may include the entire local ID (Biosample) or not (DOI), and they may include type (MGI) or not (ZFIN).
Fig 5
Fig 5. Record-level versioning and release-level versioning.
Fig 6
Fig 6. Eagle-i record-level citation widget.

References

    1. Pitcher L. Writing Ancient History: An Introduction to Classical Historiography [Internet]. I.B.Tauris; 2010. Available: https://play.google.com/store/books/details?id=A4YAAwAAQBAJ
    1. Sanderson R, Phillips M, Van de Sompel H. Analyzing the Persistence of Referenced Web Resources with Memento [Internet]. arXiv [cs.DL]. 2011. Available: http://arxiv.org/abs/1105.3459
    1. Pepe A, Goodman A, Muench A, Crosas M, Erdmann C. How do astronomers share data? Reliability and persistence of datasets linked in AAS publications and a qualitative study of data practices among US astronomers. PLoS One. 2014;9: e104798 doi: 10.1371/journal.pone.0104798 - DOI - PMC - PubMed
    1. Klein M, Van de Sompel H, Sanderson R, Shankar H, Balakireva L, Zhou K, et al. Scholarly context not found: one in five articles suffers from reference rot. PLoS One. 2014;9: e115253 doi: 10.1371/journal.pone.0115253 - DOI - PMC - PubMed
    1. Bugeja MJ, Dimitrova DV. Vanishing Act: The Erosion of Online Footnotes and Implications for Scholarship in the Digital Age [Internet]. 2010. Available: https://journals.ala.org/rusq/article/viewFile/3569/3871

MeSH terms