Scientific Reproducibility in Biomedical Research: Provenance Metadata Ontology for Semantic Annotation of Study Description
- PMID: 28269904
- PMCID: PMC5333253
Scientific Reproducibility in Biomedical Research: Provenance Metadata Ontology for Semantic Annotation of Study Description
Abstract
Scientific reproducibility is key to scientific progress as it allows the research community to build on validated results, protect patients from potentially harmful trial drugs derived from incorrect results, and reduce wastage of valuable resources. The National Institutes of Health (NIH) recently published a systematic guideline titled "Rigor and Reproducibility " for supporting reproducible research studies, which has also been accepted by several scientific journals. These journals will require published articles to conform to these new guidelines. Provenance metadata describes the history or origin of data and it has been long used in computer science to capture metadata information for ensuring data quality and supporting scientific reproducibility. In this paper, we describe the development of Provenance for Clinical and healthcare Research (ProvCaRe) framework together with a provenance ontology to support scientific reproducibility by formally modeling a core set of data elements representing details of research study. We extend the PROV Ontology (PROV-O), which has been recommended as the provenance representation model by World Wide Web Consortium (W3C), to represent both: (a) data provenance, and (b) process provenance. We use 124 study variables from 6 clinical research studies from the National Sleep Research Resource (NSRR) to evaluate the coverage of the provenance ontology. NSRR is the largest repository of NIH-funded sleep datasets with 50,000 studies from 36,000 participants. The provenance ontology reuses ontology concepts from existing biomedical ontologies, for example the Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT), to model the provenance information of research studies. The ProvCaRe framework is being developed as part of the Big Data to Knowledge (BD2K) data provenance project.
Figures
Similar articles
-
ProvCaRe Semantic Provenance Knowledgebase: Evaluating Scientific Reproducibility of Research Studies.AMIA Annu Symp Proc. 2018 Apr 16;2017:1705-1714. eCollection 2017. AMIA Annu Symp Proc. 2018. PMID: 29854241 Free PMC article.
-
ProvCaRe: Characterizing scientific reproducibility of biomedical research studies using semantic provenance metadata.Int J Med Inform. 2019 Jan;121:10-18. doi: 10.1016/j.ijmedinf.2018.10.009. Epub 2018 Nov 3. Int J Med Inform. 2019. PMID: 30545485 Free PMC article.
-
Semantic Provenance Graph for Reproducibility of Biomedical Research Studies: Generating and Analyzing Graph Structures from Published Literature.Stud Health Technol Inform. 2019 Aug 21;264:328-332. doi: 10.3233/SHTI190237. Stud Health Technol Inform. 2019. PMID: 31437939 Free PMC article.
-
Data Provenance in Biomedical Research: Scoping Review.J Med Internet Res. 2023 Mar 27;25:e42289. doi: 10.2196/42289. J Med Internet Res. 2023. PMID: 36972116 Free PMC article.
-
Scaling Up Scientific Discovery in Sleep Medicine: The National Sleep Research Resource.Sleep. 2016 May 1;39(5):1151-64. doi: 10.5665/sleep.5774. Sleep. 2016. PMID: 27070134 Free PMC article. Review.
Cited by
-
Demonstrating the data integrity of routinely collected healthcare systems data for clinical trials (DEDICaTe): A proof-of-concept study.Health Informatics J. 2024 Jul-Sep;30(3):14604582241276969. doi: 10.1177/14604582241276969. Health Informatics J. 2024. PMID: 39291806 Free PMC article.
-
A System to Easily Manage Metadata in Biomedical Research Labs Based on Open-source Software.Bio Protoc. 2022 May 5;12(9):e4404. doi: 10.21769/BioProtoc.4404. eCollection 2022 May 5. Bio Protoc. 2022. PMID: 35800459 Free PMC article.
-
An Ontology-Enabled Natural Language Processing Pipeline for Provenance Metadata Extraction from Biomedical Text (Short Paper).On Move Meaningful Internet Syst. 2016 Oct;10033:699-708. doi: 10.1007/978-3-319-48472-3_43. Epub 2016 Oct 18. On Move Meaningful Internet Syst. 2016. PMID: 28664200 Free PMC article.
-
Natural Language Processing for the Evaluation of Methodological Standards and Best Practices of EHR-based Clinical Research.AMIA Jt Summits Transl Sci Proc. 2020 May 30;2020:171-180. eCollection 2020. AMIA Jt Summits Transl Sci Proc. 2020. PMID: 32477636 Free PMC article.
-
Provenance Information for Biomedical Data and Workflows: Scoping Review.J Med Internet Res. 2024 Aug 23;26:e51297. doi: 10.2196/51297. J Med Internet Res. 2024. PMID: 39178413 Free PMC article.
References
-
- Steward O, Popovich P.G, Dietrich W.D, Kleitman N. Replication and reproducibility in spinal cord injury research. Experimental Neurology. 2012;233(2):597–605. - PubMed
-
- Hess KR. Statistical Design Considerations in Animal Studies Published Recently in Cancer Research. Cancer Research. 2011;71(625) - PubMed
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources