Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Feb 23;10(1):99.
doi: 10.1038/s41597-023-01968-9.

Developing a standardized but extendable framework to increase the findability of infectious disease datasets

Collaborators, Affiliations

Developing a standardized but extendable framework to increase the findability of infectious disease datasets

Ginger Tsueng et al. Sci Data. .

Abstract

Biomedical datasets are increasing in size, stored in many repositories, and face challenges in FAIRness (findability, accessibility, interoperability, reusability). As a Consortium of infectious disease researchers from 15 Centers, we aim to adopt open science practices to promote transparency, encourage reproducibility, and accelerate research advances through data reuse. To improve FAIRness of our datasets and computational tools, we evaluated metadata standards across established biomedical data repositories. The vast majority do not adhere to a single standard, such as Schema.org, which is widely-adopted by generalist repositories. Consequently, datasets in these repositories are not findable in aggregation projects like Google Dataset Search. We alleviated this gap by creating a reusable metadata schema based on Schema.org and catalogued nearly 400 datasets and computational tools we collected. The approach is easily reusable to create schemas interoperable with community standards, but customized to a particular context. Our approach enabled data discovery, increased the reusability of datasets from a large research consortium, and accelerated research. Lastly, we discuss ongoing challenges with FAIRness beyond discoverability.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Distribution of Schema.org standards in common data repositories. (a) Schema.org compliance in common biomedical repositories (see File 1 available on Zenodo). (b) Within Schema.org-compliant repositories, each source used the standard differently. While description and name were almost universally provided for each dataset, other metadata properties were more inconsistently used within and between sources. Google Dataset Search marginality is provided in ref. .
Fig. 2
Fig. 2
(a) The DDE Dataset registration flowchart illustrates how to register a dataset with minimal effort. (b) The portal compatibility checker tool helps identify Schema.org-compliant repositories. Available at https://discovery.biothings.io/compatibility.
Fig. 3
Fig. 3
NIAID Systems Biology Consortium Dataset and ComputationalTool Catalog. The metadata is registered and available through the Data Discovery Engine (DDE). (a) The DDE provides an interface to search for Datasets and ComputationalTools registered according to the NIAID SysBio schemas. (b) Example metadata page for a Dataset registered on the DDE according to the NIAID SysBio schema. (c) The same dataset in b viewed in Google Dataset Search after its registration in the DDE. The standardized metadata is exposed as structured data markup, allowing web crawlers such as Google Dataset Search to discover them, increasing their findability. (d) Summary statistics for the Datasets and ComputationalTools registered by the NIAID Systems Biology groups. (e) Comparison of measurement techniques by pathogen in registered datasets–.

References

    1. Siebert M, et al. Data-sharing recommendations in biomedical journals and randomised controlled trials: an audit of journals following the ICMJE recommendations. BMJ Open. 2020;10:e038887. doi: 10.1136/bmjopen-2020-038887. - DOI - PMC - PubMed
    1. Springer Nature Data Availability Statements. Springer Naturehttps://www.springernature.com/gp/authors/research-data-policy/data-avai....
    1. Science Data and Code Deposition Policy. Science Journals: editorial policieshttps://www.science.org/content/page/science-journals-editorial-policies.
    1. The EMBO Journal: Author Guidelines. https://www.embopress.org/page/journal/14602075/authorguide 10.1002/(ISSN)1460-2075.
    1. Information for Authors: Cell. https://www.cell.com/cell/authors.