Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Apr 3:29:138-148.
doi: 10.1016/j.csbj.2025.03.052. eCollection 2025.

AI-based nanotoxicity data extraction and prediction of nanotoxicity

Affiliations

AI-based nanotoxicity data extraction and prediction of nanotoxicity

Eunyong Ha et al. Comput Struct Biotechnol J. .

Abstract

With the growing use of nanomaterials (NMs), assessing their toxicity has become increasingly important. Among toxicity assessment methods, computational models for predicting nanotoxicity are emerging as alternatives to traditional in vitro and in vivo assays, which involve high costs and ethical concerns. As a result, the qualitative and quantitative importance of data is now widely recognized. However, collecting large, high-quality data is both time-consuming and labor-intensive. Artificial intelligence (AI)-based data extraction techniques hold significant potential for extracting and organizing information from unstructured text. However, the use of large language models (LLMs) and prompt engineering for nanotoxicity data extraction has not been widely studied. In this study, we developed an AI-based automated data extraction pipeline to facilitate efficient data collection. The automation process was implemented using Python-based LangChain. We used 216 nanotoxicity research articles as training data to refine prompts and evaluate LLM performance. Subsequently, the most suitable LLM with refined prompts was used to extract test data, from 605 research articles. As a result, data extraction performance on training data achieved F1D.E. (F1 score for Data Extraction) ranging from 84.6 % to 87.6 % across different LLMs. Furthermore, using the extracted dataset from test set, we constructed automated machine learning (AutoML) models that achieved F1N.P. (F1 score for Nanotoxicity Prediction) exceeding 86.1 % in predicting nanotoxicity. Additionally, we assessed the reliability and applicability of models by comparing them in terms of ground truth, size, and balance. This study highlights the potential of AI-based data extraction, representing a significant contribution to nanotoxicity research.

Keywords: Automated machine learning; Data extraction; LangChain; Large Language Models; Nanotoxicity; Prompt engineering.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Figures

None
Graphical abstract
Fig. 1
Fig. 1
Overall workflow of (1) data preparation, (2) data extraction using LLMs, and (3) model development. The training data undergoes automated extraction, evaluation, and prompt engineering to refine the prompts (red line). The test data is automatically extracted using the selected LLM and prompts, followed by data processing and model development through AutoML (blue line).
Fig. 2
Fig. 2
A flowchart illustrating the systematic querying of material information, physicochemical (PChem) properties, and toxicological (Tox) properties. The process begins with querying material information, followed by questions on PChem and Tox properties. For PChem properties, the output from the material information query (i) is combined with specific questions on PChem properties (q) and provided as input to the LLM. Missing data points, where material information or Tox properties are labeled as "None," are used for evaluating extraction performance but are excluded from the final datasets.
Fig. 3
Fig. 3
Prompt engineering from a standard prompt (left) to a carefully designed prompt (right) for extracting material information. The refined prompt guides the LLM in adjusting the output format by providing detailed instructions and employing few-shot prompting, ensuring a consistent and structured output format.
Fig. 4
Fig. 4
Prompt engineering from a standard prompt (left) to a carefully designed prompt (right) for extracting physicochemical (A) and toxicological (B) properties. For physicochemical properties (A), the prompt standardizes numerical values, including ranges and error margins, while for toxicological properties (B), it enforces standardized categorical formats, such as abbreviations and delimiters, to enhance accuracy and usability in downstream analysis.
Fig. 5
Fig. 5
Evaluation of data extraction performance. (A) Comparison of time spent on data extraction per paper using manual extraction versus automated extraction by each LLM. The total time for each method is further broken down into manual extraction time, code run time and embedding time. (B) Average PrecisionD.E., RecallD.E., and F1D.E. for each LLM, with standard deviations displayed as error bars. (C) Heatmaps showing the percentage of PrecisionD.E., RecallD.E., and F1D.E. for specific nanotoxicity attributes across each LLM, with color gradients representing performance levels (50 %–100 %).
Fig. 6
Fig. 6
Performance metrics of AutoML models trained on HaHa-Auto, HaHa-Manual, and Ha IIIB datasets. The box plots show (A) AccuracyN.P., (B) F1N.P., (C) PrecisionN.P., and (D) RecallN.P., comparing results across Vertex AI, Azure, SageMaker, and Dataiku. Each colored dot represents the best-performing model on each platform. The black hoizontal line represents the median, while the plus sign indicates the mean value. Note that the Ha IIIB dataset did not meet the required threshold of 1000 rows, necessary for model training on Vertex AI. Therefore, performance metrics for Vertex AI on Ha IIIB are not shown.
Fig. 7
Fig. 7
The feature importance of model trained on HaHa-Auto dataset. The chart represents the mean and standard deviation of feature importance across each AutoML platform.
Fig. 8
Fig. 8
Comparison of the applicability domain (AD) of models trained on HaHa-Auto and Ha IIIB datasets for numerical attributes. The thresholds are determined based on the 95 % confidence level of the Euclidean distance distribution. Only the training data from each dataset is visualized.

References

    1. Yan X., Yue T., Winkler D.A., Yin Y., Zhu H., Jiang G., et al. Converting nanotoxicity data to information using artificial intelligence and simulation. Chem Rev. 2023;123:8575–8637. doi: 10.1021/acs.chemrev.3c00070. - DOI - PubMed
    1. Golbamaki N., Rasulev B., Cassano A., Marchese Robinson R.L., Benfenati E., Leszczynski J., et al. Genotoxicity of metal oxide nanomaterials: review of recent data and discussion of possible mechanisms. Nanoscale. 2015;7:2154–2198. doi: 10.1039/c4nr06670g. - DOI - PubMed
    1. Varsou D.-D., Kolokathis P.D., Antoniou M., Sidiropoulos N.K., Tsoumanis A., Papadiamantis A.G., et al. In silico assessment of nanoparticle toxicity powered by the enalos cloud platform: integrating automated machine learning and synthetic data for enhanced nanosafety evaluation. Comput Struct Biotechnol J. 2024;25:47–60. doi: 10.1016/j.csbj.2024.03.020. - DOI - PMC - PubMed
    1. Verma S.K., Nandi A., Simnani F.Z., Singh D., Sinha A., Naser S.S., et al. In silico nanotoxicology: The computational biology state of art for nanomaterial safety assessments. Mater Des. 2023;235 doi: 10.1016/j.matdes.2023.112452. - DOI
    1. Huang Y., Li X., Xu S., Zheng H., Zhang L., Chen J., et al. Quantitative structure-activity relationship models for predicting inflammatory potential of metal oxide nanoparticles. Environ Health Perspect. 2020;128:67010. doi: 10.1289/EHP6508. - DOI - PMC - PubMed

LinkOut - more resources