Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Mar 8:13:169.
doi: 10.12688/f1000research.141056.1. eCollection 2024.

Translating nanoEHS data using EPA NaKnowBase and the resource description framework

Affiliations

Translating nanoEHS data using EPA NaKnowBase and the resource description framework

Holly M Mortensen et al. F1000Res. .

Abstract

Background: The U.S. Federal Government has supported the generation of extensive amounts of nanomaterials and related nano Environmental Health and Safety (nanoEHS) data, there is a need to make these data available to stakeholders. With recent efforts, a need for improved interoperability, translation, and sustainability of Federal nanoEHS data in the United States has been realized. The NaKnowBase (NKB) is a relational database containing experimental results generated by the EPA Office of Research and Development (ORD) regarding the actions of engineered nanomaterials on environmental and biological systems. Through the interaction of the National Nanotechnology Initiative's Nanotechnology Environmental Health Implications (NEHI) Working Group, and the Database and Informatics Interest Group (DIIG), a U.S. Federal nanoEHS Consortium has been formed.

Methods: The primary goal of this consortium is to establish a "common language" for nanoEHS data that aligns with FAIR data standards. A second goal is to overcome nomenclature issues inherent to nanomaterials data, ultimately allowing data sharing and interoperability across the diverse U.S. Federal nanoEHS data compendium, but also in keeping a level of consistency that will allow interoperability with U.S. and European partners. The most recent version of the EPA NaKnowBase (NKB) has been implemented for semantic integration. Computational code has been developed to use each NKB record as input, modify and filter table data, and subsequently output each modified record to a Research Description Framework (RDF). To improve the accuracy and efficiency of this process the EPA has created the OntoSearcher tool. This tool partially automates the ontology mapping process, thereby reducing onerous manual curation.

Conclusions: Here we describe the efforts of the US EPA in promoting FAIR data standards for Federal nanoEHS data through semantic integration, as well as in the development of NAMs (computational tools) to facilitate these improvements for nanoEHS data at the Federal partner level.

Keywords: nanomaterial; database; ontology.

PubMed Disclaimer

Conflict of interest statement

No competing interests were disclosed.

Figures

Figure 1.
Figure 1.. Unique names for materials in the NKB were generated using natural language processing (NLP) descriptions of the materials.
These descriptions included physical, chemical, and commercial traits.
Figure 2.
Figure 2.. The NaKnowBase chemical substance list is accessible through the CompTox Chemicals Dashboard at https://comptox.epa.gov/dashboard/chemical-lists/naknowbase and provides access to each member of the list and their related substances.
Figure 3.
Figure 3.. The name “nano-silver_na_citrate_75_nanometers_nanocomposix_na_6” maps onto the definition displayed in the Figure.
Figure 4.
Figure 4.. The first phase of the OntoSearcher workflow is an iterative exploration performed by the user and assisted by OntoSearcher.
The user manually selects one or more ontologies, uses matcher() to see how well they cover the terms in the dataset, and is provided with suggestions for additional ontologies by bioportal_search(). This process continues until the user is satisfied with the coverage provided by the selected ontologies.
Figure 5.
Figure 5.. The second phase of the OntoSearcher workflow is a software-guided mapping of the dataset.
The user processes each table from their dataset individually. The table is converted to triples, and the results of the matcher() method are used to automatically convert as many terms as possible into their equivalents from the ontologies. The remaining unmapped terms are reported to the user for manual curation. Once the manual mappings are supplied to Ontosearcher, it finishes handling the replacements.
Figure 6.
Figure 6.. In each table, the first, vertically-alligned column lists the shared subject.
The second column lists the predicate. The third column lists the object of the triple, which is either a simple object or a “bag” of triples sharing a common theme. Bags are further extended to show their predicates and objects. Connections are made through a combination of DOIs and IDs, represented by the lines in the schema. Since all data in the NKB is sourced from a paper, all types of data draw their DOI from the Publication triples. The IDs are used for differentiation of mediums, materials, and assays within a DOI. Mediums are marked in orange, materials are in blue, and assays are in green. This schema trims each table in the interest of space.
Figure 7.
Figure 7.. A simple query on the NaKnowBase RDF.
This query finds all materials in the NaKnowBase listed as having a Titanium Dioxide core, then groups the results by source publication and limits the display to the top 5.
Figure 8.
Figure 8.. A federated query using the NaKnowBase RDF and the AOPDB SPARQL endpoint.
This query calls to the AOPDB SPARQL endpoint for information on the relationships between materials (by CASRN), genes, and pathways. Then, locally, the results are used to query the NaKnowBase RDF and determine which NKB materials correspond to the results from the AOPDB. The results show the 5 material-gene combinations that impact the most pathways, as well as the DOI of the source paper for that material in the NKB. The materials are reported by link to the CompTox Chemicals Dashboard.

References

Bibliography

    1. Allied Market Research: Nanotechnology Market By Type (Nanosensor and Nanodevice) and Application (Electronics, Energy, Chemical Manufacturing, Aerospace & Defense, Healthcare, and Others): Global Opportunity Analysis and Industry Forecast, 2021-2030. 2021. Reference Source
    1. Ayadi A, Auffan M, Rose J: Ontology-based NLP information extraction to enrich nanomaterial environmental exposure database. Procedia Comput. Sci. 2020;176:360–369. 10.1016/j.procs.2020.08.037 - DOI
    1. Ayadi A, Rose J, Garidel-Thoron C, et al. : MESOCOSM: A mesocosm database management system for environmental nanosafety. NanoImpact. 2021;21:100288. 10.1016/j.impact.2020.100288 - DOI - PubMed
    1. Bachmann M: RapidFuzz. Zenodo. 2021.
    1. Boettiger C: rdflib: A high level wrapper around the redland package for common rdf applications. Zenodo. 2018.

References

    1. : Harmonising knowledge for safer materials via the “NanoCommons” Knowledge Base. Frontiers in Physics .2023;11: 10.3389/fphy.2023.1271842 10.3389/fphy.2023.1271842 - DOI