Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 May 26;49(6):816-819.
doi: 10.1038/ng.3864.

Finding useful data across multiple biomedical data repositories using DataMed

Affiliations

Finding useful data across multiple biomedical data repositories using DataMed

Lucila Ohno-Machado et al. Nat Genet. .

Abstract

The value of broadening searches for data across multiple repositories has been identified by the biomedical research community. As part of the US National Institutes of Health (NIH) Big Data to Knowledge initiative, we work with an international community of researchers, service providers and knowledge experts to develop and test a data index and search engine, which are based on metadata extracted from various data sets in a range of repositories. DataMed is designed to be, for data, what PubMed has been for the scientific literature. DataMed supports the findability and accessibility of data sets. These characteristics—along with interoperability and reusability—compose the four FAIR principles to facilitate knowledge discovery in today’s big data–intensive science landscape.

PubMed Disclaimer

Conflict of interest statement

COMPETING FINANCIAL INTERESTS

The authors declare no competing financial interests.

Figures

Figure 1
Figure 1
Data sources have various metadata specifications, which undergo ingestion into the common DATS model, whose metadata elements are used for indexing and DataMed searches. A terminology server is used to expand, transform and standardize concepts used in metadata descriptions and in user queries. The user is only responsible for submitting a query in natural language to the DataMed search engine, such as “astrocytoma and IDH1.” (The figure uses illustrations from PresenterMedia.com.)
Figure 2
Figure 2
Community input to the Data Discovery Index Consortium. Working groups involved over 86 people from multiple institutions to scope the project via use cases, develop core metadata specifications, recommend identifier strategies, develop and test the search engine prototype, and discuss issues in data citation. Additionally, bioCADDIE funded external pilot projects for development of software that will be incorporated into the DataMed prototype.

Similar articles

Cited by

References

    1. Wilkinson MD et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data 3, 160018 (2016). - PMC - PubMed
    1. Collins FS & Tabak LA Policy: NIH plans to enhance reproducibility. Nature 505, 612–613 (2014). - PMC - PubMed
    1. Bourne PE et al. The NIH Big Data to Knowledge (BD2K) initiative. J. Am. Med. Inform. Assoc 22, 1114 (2015). - PMC - PubMed
    1. Lu Z PubMed and beyond: a survey of web tools for searching biomedical literature. Database (Oxford) 2011, baq036 (2011). - PMC - PubMed
    1. Sansone S-A et al. DATS: the data tag suite to enable discoverability of datasets. Sci. Data 4, 170059 (2017). - PMC - PubMed

Publication types