Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Sep 30;9(1):592.
doi: 10.1038/s41597-022-01707-6.

Machine actionable metadata models

Affiliations

Machine actionable metadata models

Dominique Batista et al. Sci Data. .

Abstract

Community-developed minimum information checklists are designed to drive the rich and consistent reporting of metadata, underpinning the reproducibility and reuse of the data. These reporting guidelines, however, are usually in the form of narratives intended for human consumption. Modular and reusable machine-readable versions are also needed. Firstly, to provide the necessary quantitative and verifiable measures of the degree to which the metadata descriptors meet these community requirements, a requirement of the FAIR Principles. Secondly, to encourage the creation of standards-driven templates for metadata authoring, especially when describing complex experiments that require multiple reporting guidelines to be used in combination or extended. We present new functionalities to support the creation and improvements of machine-readable models. We apply the approach to an exemplar set of reporting guidelines in Life Science and discuss the challenges. Our work, targeted to developers of standards and those familiar with standards, promotes the concept of compositional metadata elements and encourages the creation of community-standards which are modular and interoperable from the onset.

PubMed Disclaimer

Conflict of interest statement

The authors have no competing interests. SAS is the Academic Editor of Scientific Data, and PRS is a member of its Senior Editorial Board.

Figures

Fig. 1
Fig. 1
Difference in representation of the MIAME checklist in two public repositories: GEO and ArrayExpress. (A) GEO (10.25504/FAIRsharing.5hc8vt) and ArrayExpress (10.25504/FAIRsharing.6k0kwd) are two databases highly recommended by journals and funders data policies, and both implement the community-defined MIAME reporting guideline to describe microarray experiment (10.25504/FAIRsharing.32b10v), among others. The implementation of MIAME is done via several formats (used to upload and download datasets from these two databases), which include SOFT (10.25504/FAIRsharing.3gxr9) and MINiML (10.25504/FAIRsharing.gaegy8) for GEO; MAGE-ML (10.25504/FAIRsharing.x964fb) that is now deprecated and superseded by MAGE-TAB (10.25504/FAIRsharing.ak8p5g) for the ArrayExpress, which also uses the EFO terminology (10.25504/FAIRsharing.1gr4tz) to annotate the metadata. (B) Using a few metadata requirements from MIAME as example (namely: study, study title, study description) we illustrate how the metadata labels, along with their level of requirement (must, should, may), varies across the formats used by the two databases.
Fig. 2
Fig. 2
How to create a reporting guideline that is machine-readable ab initio. (1) A checklist/reporting guideline is formally expressed as JSON schemas. 1*) Quality Control step: JSON ScheeLD provides the means to validate the model against the JSON Schema specification; and the JSON Schema Documenter helps visualise models in the browser. (2) JSON ScheeLD creates JSON-LD context file stubs and user provides the mapping manually. 2*) Quality Control step: use JSON Schema Documenter to verify that all the fields are mapped to an ontology term. (3) Export to the CEDAR API and provide stable identifiers.
Fig. 3
Fig. 3
How to merge two existing guidelines into a new set of schemas. (1) A developer uses the JSON Schema Documenter to explore the different guidelines, MIACME and MIACA. (2) JSON ScheeLD relies on the context files to compare the two given models and outputs a file readable by the JSON Compare Viewer. This allows the developer to see which fields are semantically identical. (3) JSON ScheeLD pulls the fields from the MIACME model and injects them into the MIACA if they are missing and creates a whole new set of schemas and context files. Directionality is important: merging MIACME into MIACA will not produce the same result as merging MIACA into MIACME. (4) After the merge is complete, the developer can go back to step 2 and compare the new model with the old one to ensure quality control.

References

    1. Piwowar HA. Who shares? Who doesn’t? Factors associated with openly archiving raw research data. PLoS ONE. 2011;6:e18657. doi: 10.1371/journal.pone.0018657. - DOI - PMC - PubMed
    1. Wilkinson MD, et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data. 2016;3:160018. doi: 10.1038/sdata.2016.18. - DOI - PMC - PubMed
    1. Wise J, et al. Implementation and relevance of FAIR data principles in biopharmaceutical R&D. Drug Discov. Today. 2019;24:933–938. doi: 10.1016/j.drudis.2019.01.008. - DOI - PubMed
    1. 2017 Annex 4: Expert Group on Open Science. http://www.g8.utoronto.ca/science/2017-annex4-open-science.html.
    1. https://www.eosc.eu/sites/default/files/EOSC-SRIA-V1.0_15Feb2021.pdf. https://www.eosc.eu/sites/default/files/EOSC-SRIA-V1.0_15Feb2021.pdf.