Modeling community standards for metadata as templates makes data FAIR

Mark A Musen¹, Martin J O'Connor², Erik Schultes³, Marcos Martínez-Romero^{2

4}, Josef Hardi², John Graybeal²

Affiliations

¹ Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA, 94305, USA. musen@stanford.edu.
² Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA, 94305, USA.
³ GO FAIR Foundation, Rijnsburgerweg 10, 2333 AA, Leiden, Netherlands.
⁴ Acubed Innovation Center, 601 West California Avenue, Sunnyvale, CA, 94086, USA.

PMID: 36371407
PMCID: PMC9653497
DOI: 10.1038/s41597-022-01815-3

Modeling community standards for metadata as templates makes data FAIR

Mark A Musen et al. Sci Data. 2022.

. 2022 Nov 12;9(1):696.

doi: 10.1038/s41597-022-01815-3.

Authors

Mark A Musen¹, Martin J O'Connor², Erik Schultes³, Marcos Martínez-Romero^{2

4}, Josef Hardi², John Graybeal²

Affiliations

¹ Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA, 94305, USA. musen@stanford.edu.
² Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA, 94305, USA.
³ GO FAIR Foundation, Rijnsburgerweg 10, 2333 AA, Leiden, Netherlands.
⁴ Acubed Innovation Center, 601 West California Avenue, Sunnyvale, CA, 94086, USA.

PMID: 36371407
PMCID: PMC9653497
DOI: 10.1038/s41597-022-01815-3

Abstract

It is challenging to determine whether datasets are findable, accessible, interoperable, and reusable (FAIR) because the FAIR Guiding Principles refer to highly idiosyncratic criteria regarding the metadata used to annotate datasets. Specifically, the FAIR principles require metadata to be "rich" and to adhere to "domain-relevant" community standards. Scientific communities should be able to define their own machine-actionable templates for metadata that encode these "rich," discipline-specific elements. We have explored this template-based approach in the context of two software systems. One system is the CEDAR Workbench, which investigators use to author new metadata. The other is the FAIRware Workbench, which evaluates the metadata of archived datasets for their adherence to community standards. Benefits accrue when templates for metadata become central elements in an ecosystem of tools to manage online datasets-both because the templates serve as a community reference for what constitutes FAIR data, and because they embody that perspective in a form that can be distributed among a variety of software applications to assist with data stewardship and data sharing.

PubMed Disclaimer

Conflict of interest statement

ES is the Scientific Director of Partners in FAIR (https://partnersinfair.com). There are no other competing interests.

Figures

**Fig. 1**
Metadata template for capturing information about a tissue sample. This screen capture shows the template used by investigators in the NIH-supported HuBMAP consortium to specify metadata about biological specimens used to perform assays of cell-specific biomarkers. In the figure, the user is entering a controlled term from a special HuBMAP ontology to provide the metadata entry for the specimen’s preparation medium. The attributes of tissues are the ones that the HuBMAP community has chosen to standardize for its descriptions of such samples. The ontology terms used to provide values for the metadata attributes similarly represent community-endorsed standards for declaring this kind of information.

**Fig. 2**
A collection of metadata templates in the CEDAR library. The screen capture depicts a set of templates created by HuBMAP users or shared with their community members. In CEDAR, users may view and access their own metadata templates, templates explicitly shared with the user by others, and templates shared by designated research communities stored in “community folders.” Here, the user is seeking to populate the Sample Section template, which appears in Fig. 1.

**Fig. 3**
FAIRware Workbench analysis of a metadata record for a tissue sample. The screen capture shows the analysis of one of the records in the repository, indicating where the reporting guideline may not have been followed or where ontology terms were not used appropriately. The system automatically corrects the string “208 days” to the integer 208. There is no obvious correction for the entry for “storage medium.” Because in this example the FAIRware workbench is in interactive mode, it offers the user a menu of ontology terms that might provide a standards-adherent value.

**Fig. 4**
FAIRware Workbench summary analysis. The Workbench provides the user with an overview of how well the input data adhere to the standard defined by the metadata template indicated at runtime. We can see that, overall, there are many records with missing required fields, and several records with field values that do not adhere to standards (such as the use of standard ontology terms). At the bottom of the screen, users can see more detail and review which metadata fields cause the most difficulty.

**Fig. 5**
The JSON-LD representation of the HuBMAP metadata seen in Fig. 1. The explicit incorporation of the persistent identifiers of ontology terms provides a semantic foundation for the corresponding metadata fields. We can see, for example, that the value for “preparation medium” refers to a term from MeSH.

See this image and copyright information in PMC

References

1. Wilkinson MD, et al. The FAIR guiding principles for scientific data management and stewardship. Sci. Data. 2016;3:160018. - PMC - PubMed
1. Bloemers M, Montesanti A. The FAIR funding model: providing a framework for research funders to drive the transition toward FAIR data management and stewardship practices. Data Intelligence. 2020;2(1–2):171–180.
1. Stall S, et al. Make scientific data FAIR. Nature. 2019;570:27–29. - PubMed
1. Wilkinson MD, et al. Evaluating FAIR maturity through a scalable, automated, community-governed framework. Sci. Data. 2019;6:174. - PMC - PubMed
1. Clarke DJB, et al. FAIRshake: Toolkit to evaluate the FAIRness of research digital resources. Cell Syst. 2019;9(5):417–421. - PMC - PubMed

Grants and funding

WT_/Wellcome Trust/United Kingdom

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Modeling community standards for metadata as templates makes data FAIR

Affiliations

Modeling community standards for metadata as templates makes data FAIR

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Grants and funding

LinkOut - more resources

Full Text Sources