Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 May 25;15(1):18186.
doi: 10.1038/s41598-025-02590-y.

Prediction of reproductive and developmental toxicity using an attention and gate augmented graph convolutional network

Affiliations

Prediction of reproductive and developmental toxicity using an attention and gate augmented graph convolutional network

Si Hoon Lee et al. Sci Rep. .

Abstract

Due to the diverse molecular structures of chemical compounds and their intricate biological pathways of toxicity, predicting their reproductive and developmental toxicity remains a challenge. Traditional Quantitative Structure-Activity Relationship models that rely on molecular descriptors have limitations in capturing the complexity of reproductive and developmental toxicity to achieve high predictive performance. In this study, we developed a descriptor-free deep learning model by constructing a Graph Convolutional Network designed with multi-head attention and gated skip-connections to predict reproductive and developmental toxicity. By integrating structural alerts directly related to toxicity into the model, we enabled more effective learning of toxicologically relevant substructures. We built a dataset of 4,514 diverse compounds, including both organic and inorganic substances. The model was trained and validated using stratified 5-fold cross-validation. It demonstrated excellent predictive performance, achieving an accuracy of 81.19% on the test set. To address the interpretability of the deep learning model, we identified subgraphs corresponding to known structural alerts, providing insights into the model's decision-making process. This study was conducted in accordance with the OECD principles for reliable Quantitative Structure-Activity Relationship modeling and contributes to the development of robust in silico models for toxicity prediction.

Keywords: Graph convolutional networks; Quantitative structure-activity relationship (QSAR); Reproductive and developmental toxicity; Toxicity prediction.

PubMed Disclaimer

Conflict of interest statement

Declarations. Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Overview of the model architecture. Canonical SMILES notations serve as the initial representations for each molecular input, which are subsequently converted into graph structures comprising an adjacency matrix and a feature matrix. The number of GCN Layers, GCN Blocks, and the number of nodes in each fully connected (FC) Layer are hyperparameters that are explored through the optimization process.
Fig. 2
Fig. 2
Murcko scaffolds of the (a) positive and (b) negative datasets. Each structure is scaled according to its frequency within the dataset, with larger sizes indicating higher frequencies and smaller sizes indicating lower frequencies.
Fig. 3
Fig. 3
(a) PCA, (b) UMAP, and (c) t-SNE-based cluster analyses of the readout-layer embeddings for the 10 most frequently observed scaffolds (n = 310). Each color corresponds to a distinct scaffold, as depicted in (d).
Fig. 4
Fig. 4
Hyperparameter importance and optimization history. (a) Shows the ranking of important hyperparameters, and (b) presents the optimization history plot, where gray dots represent the individual validation loss values for each trial, and the red line tracks the lowest validation loss up to that point.
Fig. 5
Fig. 5
Predictive performance of the Param6 model. (ae) show the confusion matrices for the test sets of each fold, while (f) shows the curves of training loss and validation loss during the model training process.
Fig. 6
Fig. 6
The 5-fold average AUC-ROC curve for each of the six hyperparameter set. The gray curves represent the AUC-ROC curve for each fold. The red dashed line represents the performance of a random classifier that predicts positives and negatives randomly, serving as a baseline with an AUC of 0.5.

Similar articles

References

    1. Brent, R. L. Teratology in the 20th century environmental causes of congenital malformations in humans and how they were established. Neurotoxicol Teratol 26, 1–12. 10.1016/j.ntt.2003.09.002 (2004). - PubMed
    1. Richburg, J. H. The relevance of spontaneous- and chemically-induced alterations in testicular germ cell apoptosis to toxicology. Toxicol. Lett.112–113, 79–86. 10.1016/S0378-4274(99)00253-2 (2000). - PubMed
    1. Mínguez-Alarcón, L. et al. Occupational factors and markers of ovarian reserve and response among women at a fertility centre. Occup. Environ. Med.74, 426–431 (2017). - PMC - PubMed
    1. Tan, H. et al. Development, validation, and application of a human reproductive toxicity prediction model based on adverse outcome pathway. Environ. Sci. Technol.56, 12391–12403. 10.1021/acs.est.2c02242 (2022). - PubMed
    1. Finnell, R. H. et al. Gene environment interactions in the etiology of neural tube defects. Front. Genet.12, 659612. 10.3389/fgene.2021.659612 (2021). - PMC - PubMed

LinkOut - more resources