Machine actionable metadata models

Dominique Batista¹, Alejandra Gonzalez-Beltran^{1

2}, Susanna-Assunta Sansone¹, Philippe Rocca-Serra³

Affiliations

¹ Oxford e-Research Centre, Department of Engineering Science, University of Oxford, Oxford, UK.
² Scientific Computing Department, Rutherford Appleton Laboratory, Science and Technology Facilities Council, Didcot, UK.
³ Oxford e-Research Centre, Department of Engineering Science, University of Oxford, Oxford, UK. philippe.rocca-serra@oerc.ox.ac.uk.

PMID: 36180441
PMCID: PMC9525592
DOI: 10.1038/s41597-022-01707-6

Machine actionable metadata models

Dominique Batista et al. Sci Data. 2022.

. 2022 Sep 30;9(1):592.

doi: 10.1038/s41597-022-01707-6.

Authors

Dominique Batista¹, Alejandra Gonzalez-Beltran^{1

2}, Susanna-Assunta Sansone¹, Philippe Rocca-Serra³

Affiliations

¹ Oxford e-Research Centre, Department of Engineering Science, University of Oxford, Oxford, UK.
² Scientific Computing Department, Rutherford Appleton Laboratory, Science and Technology Facilities Council, Didcot, UK.
³ Oxford e-Research Centre, Department of Engineering Science, University of Oxford, Oxford, UK. philippe.rocca-serra@oerc.ox.ac.uk.

PMID: 36180441
PMCID: PMC9525592
DOI: 10.1038/s41597-022-01707-6

Abstract

Community-developed minimum information checklists are designed to drive the rich and consistent reporting of metadata, underpinning the reproducibility and reuse of the data. These reporting guidelines, however, are usually in the form of narratives intended for human consumption. Modular and reusable machine-readable versions are also needed. Firstly, to provide the necessary quantitative and verifiable measures of the degree to which the metadata descriptors meet these community requirements, a requirement of the FAIR Principles. Secondly, to encourage the creation of standards-driven templates for metadata authoring, especially when describing complex experiments that require multiple reporting guidelines to be used in combination or extended. We present new functionalities to support the creation and improvements of machine-readable models. We apply the approach to an exemplar set of reporting guidelines in Life Science and discuss the challenges. Our work, targeted to developers of standards and those familiar with standards, promotes the concept of compositional metadata elements and encourages the creation of community-standards which are modular and interoperable from the onset.

PubMed Disclaimer

Conflict of interest statement

The authors have no competing interests. SAS is the Academic Editor of Scientific Data, and PRS is a member of its Senior Editorial Board.

Figures

**Fig. 1**
Difference in representation of the MIAME checklist in two public repositories: GEO and ArrayExpress. (A) GEO (10.25504/FAIRsharing.5hc8vt) and ArrayExpress (10.25504/FAIRsharing.6k0kwd) are two databases highly recommended by journals and funders data policies, and both implement the community-defined MIAME reporting guideline to describe microarray experiment (10.25504/FAIRsharing.32b10v), among others. The implementation of MIAME is done via several formats (used to upload and download datasets from these two databases), which include SOFT (10.25504/FAIRsharing.3gxr9) and MINiML (10.25504/FAIRsharing.gaegy8) for GEO; MAGE-ML (10.25504/FAIRsharing.x964fb) that is now deprecated and superseded by MAGE-TAB (10.25504/FAIRsharing.ak8p5g) for the ArrayExpress, which also uses the EFO terminology (10.25504/FAIRsharing.1gr4tz) to annotate the metadata. **(B)** Using a few metadata requirements from MIAME as example (namely: study, study title, study description) we illustrate how the metadata labels, along with their level of requirement (must, should, may), varies across the formats used by the two databases.

**Fig. 2**
How to create a reporting guideline that is machine-readable *ab initio*. (1) A checklist/reporting guideline is formally expressed as JSON schemas. 1*) Quality Control step: *JSON ScheeLD* provides the means to validate the model against the JSON Schema specification; and the *JSON Schema Documenter* helps visualise models in the browser. (2) *JSON ScheeLD* creates JSON-LD context file stubs and user provides the mapping manually. 2*) Quality Control step: use *JSON Schema Documenter* to verify that all the fields are mapped to an ontology term. (3) Export to the CEDAR API and provide stable identifiers.

**Fig. 3**
How to merge two existing guidelines into a new set of schemas. (1) A developer uses the *JSON Schema Documenter* to explore the different guidelines, MIACME and MIACA. (2) *JSON ScheeLD* relies on the context files to compare the two given models and outputs a file readable by the *JSON Compare Viewer*. This allows the developer to see which fields are semantically identical. (3) *JSON ScheeLD* pulls the fields from the MIACME model and injects them into the MIACA if they are missing and creates a whole new set of schemas and context files. Directionality is important: merging MIACME into MIACA will not produce the same result as merging MIACA into MIACME. (4) After the merge is complete, the developer can go back to step 2 and compare the new model with the old one to ensure quality control.

See this image and copyright information in PMC

References

1. Piwowar HA. Who shares? Who doesn’t? Factors associated with openly archiving raw research data. PLoS ONE. 2011;6:e18657. doi: 10.1371/journal.pone.0018657. - DOI - PMC - PubMed
1. Wilkinson MD, et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data. 2016;3:160018. doi: 10.1038/sdata.2016.18. - DOI - PMC - PubMed
1. Wise J, et al. Implementation and relevance of FAIR data principles in biopharmaceutical R&D. Drug Discov. Today. 2019;24:933–938. doi: 10.1016/j.drudis.2019.01.008. - DOI - PubMed
1. 2017 Annex 4: Expert Group on Open Science. http://www.g8.utoronto.ca/science/2017-annex4-open-science.html.
1. https://www.eosc.eu/sites/default/files/EOSC-SRIA-V1.0_15Feb2021.pdf. https://www.eosc.eu/sites/default/files/EOSC-SRIA-V1.0_15Feb2021.pdf.

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Machine actionable metadata models

Affiliations

Machine actionable metadata models

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources