Perspectives on tracking data reuse across biodata resources
- PMID: 38721398
- PMCID: PMC11076920
- DOI: 10.1093/bioadv/vbae057
Perspectives on tracking data reuse across biodata resources
Abstract
Motivation: Data reuse is a common and vital practice in molecular biology and enables the knowledge gathered over recent decades to drive discovery and innovation in the life sciences. Much of this knowledge has been collated into molecular biology databases, such as UniProtKB, and these resources derive enormous value from sharing data among themselves. However, quantifying and documenting this kind of data reuse remains a challenge.
Results: The article reports on a one-day virtual workshop hosted by the UniProt Consortium in March 2023, attended by representatives from biodata resources, experts in data management, and NIH program managers. Workshop discussions focused on strategies for tracking data reuse, best practices for reusing data, and the challenges associated with data reuse and tracking. Surveys and discussions showed that data reuse is widespread, but critical information for reproducibility is sometimes lacking. Challenges include costs of tracking data reuse, tensions between tracking data and open sharing, restrictive licenses, and difficulties in tracking commercial data use. Recommendations that emerged from the discussion include: development of standardized formats for documenting data reuse, education about the obstacles posed by restrictive licenses, and continued recognition by funding agencies that data management is a critical activity that requires dedicated resources.
Availability and implementation: Summaries of survey results are available at: https://docs.google.com/forms/d/1j-VU2ifEKb9C-sW6l3ATB79dgHdRk5v_lESv2hawnso/viewanalytics (survey of data providers) and https://docs.google.com/forms/d/18WbJFutUd7qiZoEzbOytFYXSfWFT61hVce0vjvIwIjk/viewanalytics (survey of users).
© The Author(s) 2024. Published by Oxford University Press.
Conflict of interest statement
A.B. is Editor-in-Chief of Bioinformatics Advances, but was not involved in the editorial process of this manuscript.
Figures

References
-
- Alliance of Genome Resources Consortium. Updates to the alliance of genome resources central infrastructure alliance of genome resources consortium. bioRxiv. 2023, doi: 10.1101/2023.11.20.567935, preprint: not peer reviewed. - DOI
Publication types
Grants and funding
LinkOut - more resources
Full Text Sources