Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jun;131(6):65001.
doi: 10.1289/EHP11484. Epub 2023 Jun 23.

A Case for Accelerating Standards to Achieve the FAIR Principles of Environmental Health Research Experimental Data

Affiliations

A Case for Accelerating Standards to Achieve the FAIR Principles of Environmental Health Research Experimental Data

Rance Nault et al. Environ Health Perspect. 2023 Jun.

Abstract

Background: Funding agencies, publishers, and other stakeholders are pushing environmental health science investigators to improve data sharing; to promote the findable, accessible, interoperable, and reusable (FAIR) principles; and to increase the rigor and reproducibility of the data collected. Accomplishing these goals will require significant cultural shifts surrounding data management and strategies to develop robust and reliable resources that bridge the technical challenges and gaps in expertise.

Objective: In this commentary, we examine the current state of managing data and metadata-referred to collectively as (meta)data-in the experimental environmental health sciences. We introduce new tools and resources based on in vivo experiments to serve as examples for the broader field.

Methods: We discuss previous and ongoing efforts to improve (meta)data collection and curation. These include global efforts by the Functional Genomics Data Society to develop metadata collection tools such as the Investigation, Study, Assay (ISA) framework, and the Center for Expanded Data Annotation and Retrieval. We also conduct a case study of in vivo data deposited in the Gene Expression Omnibus that demonstrates the current state of in vivo environmental health data and highlights the value of using the tools we propose to support data deposition.

Discussion: The environmental health science community has played a key role in efforts to achieve the goals of the FAIR guiding principles and is well positioned to advance them further. We present a proposed framework to further promote these objectives and minimize the obstacles between data producers and data scientists to maximize the return on research investments. https://doi.org/10.1289/EHP11484.

PubMed Disclaimer

Figures

Figure 1 is a Venn diagram displaying three circles. The circle on the top is labeled Minimum Information about Animal Toxicology Experiments (34), the circle on bottom-left is labeled Toxicology Experiment Reporting Module (64), and the circle on bottom-right is labeled Tox Bio Checklist (25). The intersection area is labeled 22. The circle on the top displays the following information: there are 10 cases of chemical source details housing conditions, 7 cases of feed or water details, 2 cases fasting information. The circle on the bottom-left displays the following information: there are 10 cases of chemical source details housing conditions, 68 cases of test item details breeding pretreatments, 3 cases of acclimation details animals per cage anesthetic used. The circle on the bottom-right displays the following information: there are 3 cases of acclimation details animals per cage anesthetic used, 4 cases of genotype information, 2 cases of fasting information.
Figure 1.
Comparison of metadata terms listed in the TERM, TBC, and MIATE/invivo reporting standards. Each standard was manually examined. Only metadata requirements relevant to in vivo experiments for each reporting standard were included for comparison. Because of the use of different vocabularies, each reporting standard metadata term was manually mapped to an equivalent term in the other reporting standards if one were present. Mapping summary can be found in Excel Table S1. The Venn diagram shows the number of common and unique terms for each reporting standard. Note: MIATE/invivo, Minimum Information about Animal Toxicology Experiments in Vivo; TBC, Tox Bio Checklist; TERM, Toxicology Experiment Reporting Module.
Figure 2A is a histogram, plotting percentage database reporting, ranging from 0 to 75 in increments of 25 (y-axis) across strain, substance, sex, age, time point, dose, administration route, feed name, administration interval, vehicle, C A S number, Euthanasia method, ad lib (yes or no), supplier, average vivarium humidity, bedding type, cage type, chemical catalog number, chemical purity, Vivarium light cycle, time since last dose administered, interventions, delivery volume, environmental enrichment, targeted locus, feed source, water type, number of administrations, formula, smile string, acclimation duration, number per cage, fasting duration, international chemical code, breeding program, concentration of diluents, developmental stage, expiration date, exposure design, exposure duration, final concentration of vehicle, health status and acclimation, identification of impurities, lot (batch) number, maximum volume of liquid administered, molecular weight, percentage of substances, pretreatments (for example metabolic activation), quality criteria before use, randomization of animals to groups, salt form, storage, recovery period, type of facility, types of replicates, additional treatment, alteration (mutation, knock-in, R N A I, etc.) (x-axis) for Toxicology Experiment Reporting Module plus Tox Bio Checklist plus Minimum Information about Animal Toxicology Experiments; Toxicology Experiment Reporting Module plus Minimum Information about Animal Toxicology Experiments; Toxicology Experiment Reporting Module plus Tox Bio Checklist; Tox Bio Checklist plus Minimum Information about Animal Toxicology Experiments; Toxicology Experiment Reporting Module; Tox Bio Checklist; Minimum Information about Animal Toxicology Experiments. Figure 2B is a heatmap, plotting G E O data set, including G S E 104064; G S E 83199; G S E 83198; G S E 83197; G S E 63902; G S E 59495; G S E 24363; G S E 90614; G S E 55084; G S E 18858; G S E 178168; G S E 171942; G S E 171941; G S E 167328; G S E 148339 (y-axis) across strain, substance, sex, age, time point, dose, administration route, feed name, administration interval, vehicle, C A S number, Euthanasia method, ad lib (yes or no), supplier, average vivarium humidity, bedding type, cage type, chemical catalog number, chemical purity, Vivarium light cycle, time since last dose administered, interventions, delivery volume, environmental enrichment, targeted locus, feed source, water type, number of administrations, formula, smile string, acclimation duration, number per cage, fasting duration, international chemical code, breeding program, concentration of diluents, developmental stage, expiration date, exposure design, exposure duration, final concentration of vehicle, health status and acclimation, identification of impurities, lot (batch) number, maximum volume of liquid administered, molecular weight, percentage of substances, pretreatments (for example metabolic activation), quality criteria before use, randomization of animals to groups, salt form, storage, recovery period, type of facility, types of replicates, additional treatment, alteration (mutation, knock-in, R N A I, etc.) (x-axis) for reported and not reported.
Figure 2.
Evaluation of GEO deposited metadata conformance to the in vivo experimental environmental health science reporting standards TERM, TBC, and MIATE/invivo. GEO was queried for mouse and rat data sets using the keywords “dose-response,” “toxicology,” or “dose,” followed by computational extraction of metadata. Following semiautomated metadata assessment to remove studies involving biological (e.g., infections), physical (e.g., cold temperature), or psychological stressors, a total of 1,233 GEO data sets were examined (Excel Table S2). Expected terms were compared to requested metadata in TERM, TBC, and MIATE/invivo reporting standards (Excel Table S3). (A) Percentage of data sets providing each reporting standard term is shown. (B) Top 15 data sets providing the most terms present in any reporting standards shown as a heat map indicating which term was reported. Bolded GEO accession identifiers represent data sets using a draft version of MIATE/invivo. Documentation and code for reproducibility are provided as a supplemental document (MIATE-3.0.1.zip) and available as part of MIATE/invivo resources (doi.org/10.5281/zenodo.7667576). Data used for plotting are provided in supplementary tables (Excel Table S4–S5). Note: GEO, Gene Expression Omnibus; MIATE/invivo, Minimum Information about Animal Toxicology Experiments in Vivo; TBC, Tox Bio Checklist; TERM, Toxicology Experiment Reporting Module.

Similar articles

Cited by

References

    1. Baker M. 2016. 1,500 Scientists lift the lid on reproducibility. Nature 533(7604):452–454, PMID: , 10.1038/533452a. - DOI - PubMed
    1. NIH (National Institutes of Health) Office of Science Policy. 2020. Final NIH Policy for Data Management and Sharing: NOT-OD-21-013. https://grants.nih.gov/grants/guide/notice-files/NOT-OD-21-013.html.
    1. National Academies of Sciences Engineering, and Medicine. 2018. Open Science by Design: Realizing a Vision for 21st Century Research. Washington, DC: National Academies Press. 10.17226/25116 [accessed 19 June 2023]. - DOI - PubMed
    1. European Commission, Directorate-General for Research and Innovation. 2018. Turning FAIR into Reality: Final Report and Action Plan from the European Commission Expert Group on FAIR data. https://data.europa.eu/doi/10.2777/1524 [accessed 19 June 2023]. - DOI
    1. Wellcome Trust. Open research. https://wellcome.org/what-we-do/our-work/open-research. [accessed 6 February 2023].