Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Mar-Apr;22(2):22-32.
doi: 10.1109/mcse.2019.2952838. Epub 2019 Nov 12.

Comparing the Use of Research Resource Identifiers and Natural Language Processing for Citation of Databases, Software, and Other Digital Artifacts

Affiliations

Comparing the Use of Research Resource Identifiers and Natural Language Processing for Citation of Databases, Software, and Other Digital Artifacts

Chun-Nan Hsu et al. Comput Sci Eng. 2020 Mar-Apr.

Abstract

The Research Resource Identifier (RRID) was introduced in 2014 to better identify biomedical research resources and track their use across the literature, including key digital resources such as databases and software. Authors include an RRID after the first mention of any resource used. Here, we provide an overview of RRIDs and analyze their use for digital resource identification. We quantitatively compare the output of our RRID curation workflow with the outputs of automated text mining systems used to identify resource mentions in text. The results show that authors follow RRID reporting guidelines well, and that our natural language processing based text mining was able to identify nearly all of the resources identified by RRIDs as well as thousands more. Finally, we demonstrate how RRIDs and text mining can complement each other to provide a scalable solution to digital resource citation.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: A. E. Bandrowski, M. Martone, and J. Grethe have an equity interest in SciCrunch, Inc., a company that develops services and tools based on RRIDs that may potentially benefit from the research results. Dr. Bandrowski and Dr. Martone are employed by the company. The terms of this arrangement have been reviewed and approved by the University of California San Diego in accordance with its conflict of interest policies.

Figures

Figure 1.
Figure 1.
Match and mismatch counts of PMID-RRID pairs among the datasets from the SciBot-Curator, RDW, and RDW & in RRID-by-RDW. The Venn diagram illustrates how the counts were determined. Since the pairs in the SciBot-Curator without the “incorrect” or “insufficientMetaData” tags were considered the ground truth, pairs by RDW or RDW-by-RRID match the ground truth are TPs, otherwise they are FPs. The pairs tagged as “incorrect” or “insufficientMetaData” are negatives. The diagram shows that RDW captures nearly all the ground truth but also many FPs.

References

    1. Fenner M, et al. “A data citation roadmap for scholarly data repositories,” Sci. Data, vol. 6, no. 1, Apr. 2019, Art. no. 28. - PMC - PubMed
    1. Smith A, Katz D, and Niemeyer K, “Software citation principles,” PeerJ Comput. Sci, 2016, Art. no. 2:e86.
    1. Bandrowski AE and Martone ME, “RRIDs: A simple step toward improving reproducibility through rigor and transparency of experimental methods,” Neuron, vol. 90, no. 3, pp. 434–436, May 2016. - PMC - PubMed
    1. Gardner D. et al. , “The neuroscience information framework: A data and knowledge environment for neuroscience,” Neuroinformatics, vol. 6, no. 3, pp. 149–160, Sep. 2008. - PMC - PubMed
    1. Cachat J. et al. , “A survey of the neuroscience resource landscape: Perspectives from the neuroscience information framework,” Int. Rev. Neurobiol, vol. 103, pp. 39–68, 2012. - PubMed