Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Apr 9:12:1567199.
doi: 10.3389/fmolb.2025.1567199. eCollection 2025.

Identification of metabolomics-based biomarker discovery in individuals with down syndrome utilizing kernel-tree model-enhanced explainable artificial intelligence methodology

Affiliations

Identification of metabolomics-based biomarker discovery in individuals with down syndrome utilizing kernel-tree model-enhanced explainable artificial intelligence methodology

Cemil Colak et al. Front Mol Biosci. .

Abstract

This study aims to develop an explainable artificial intelligence (XAI) model integrated with machine learning (ML) to comprehensively investigate metabolic differences between individuals with Down syndrome (T21) and healthy controls (D21) and to identify novel/pathway-specific biomarkers. In this study, ML classifiers including AdaBoost, LightGBM, Random Forest, KTBoost, and XGBoost are applied to metabolomics data obtained from metabolomic analyses by high-resolution liquid chromatography-mass spectrometry (LC-MS) using blood plasma samples of 316 T21 and 103 D21 individuals, and the importance of metabolites is evaluated by XAI-based SHAP analysis. The KTBoost model shows the highest classification performance with an accuracy of 90.4% and area under the curve (AUC) of 95.9%, outperforming AdaBoost, LightGBM, Random Forest, and XGBoost. Significant downregulation and upregulation of some metabolites were observed in the T21 group compared to the D21 group. Metabolites such as vitamin C, taurolithocholic acid, sphingosine, and prostaglandin A2/B2/J2 are observed at low levels in the T21 group. In contrast, metabolites such as thymidine, tau-roursodeoxycholic acid, serine, and nervonic acid are elevated. SHAP analysis revealed that L-Citrulline, Kynurenin, Prostaglandin A2/B2/J2, Urate, and Pantothenate metabolites could be novel/pathway-specific biomarkers to differentiate the T21 group. This study revealed significant metabolic alterations in individuals with T21 and demonstrated the effectiveness of the combination of ML and XAI methods to identify novel/pathway-specific biomarkers. The findings may contribute to a better understanding of Down syndrome's molecular mechanisms and the development of future diagnostic and therapeutic strategies.

Keywords: KTBoost; SHAP; biomarker; down syndrome; machine learning; metabolomics analysis.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. The author(s) declared that they were an editorial board member of Frontiers, at the time of submission. This had no impact on the peer review process and the final decision.

Figures

FIGURE 1
FIGURE 1
Volcano plot.
FIGURE 2
FIGURE 2
VIP graph for PLS-DA Model.
FIGURE 3
FIGURE 3
Confusion matrix of the KTBoost model for Down syndrome prediction.
FIGURE 4
FIGURE 4
Different model results for ROC AUC values.
FIGURE 5
FIGURE 5
Graphical representation of the class probabilities of the optimal KTBoost model.
FIGURE 6
FIGURE 6
KTBoost model interpretation. (A): Using the final model, we rank the stability and interpretative relevance of the top 20-biomarker metabolites (B): Average order of importance (|SHAP value|) of the top 20 biomarker metabolites; the greater the SHAP value of a characteristic, the more probable the patient has T21.

References

    1. Ahn J. M., Kim J., Kim K. J. T. (2023). Ensemble machine learning of gradient boosting (XGBoost, LightGBM, CatBoost) and attention-based CNN-lstm for harmful algal blooms forecasting. forecasting 15 (10), 608. 10.3390/toxins15100608 - DOI - PMC - PubMed
    1. Arrieta A. B., Díaz-Rodríguez N., Del Ser J., Bennetot A., Tabik S., Barbado A., et al. (2020). Explainable Artificial Intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI, 58, 82–115.
    1. Asif M., Martiniano H. F., Vicente A. M., Couto F. M. (2018). Identifying disease genes using machine learning and gene functional similarities, assessed through Gene Ontology. PloS one 13 (12), e0208626. 10.1371/journal.pone.0208626 - DOI - PMC - PubMed
    1. Bahado-Singh R., Akolekar R., Mandal R., Dong E., Xia J., Kruger M., et al. (2015). Metabolomic analysis for first trimester down syndrome prediction. Obstet. Anesth. Dig. 35 (1), 35–36. 10.1097/01.aoa.0000460405.80294.32 - DOI - PubMed
    1. Bahado-Singh R. O., Akolekar R., Mandal R., Dong E., Xia J., Kruger M., et al. (2013). Metabolomic analysis for first-trimester Down syndrome prediction. Am. J. obstetrics Gynecol. 208 (5), 371. e1–e8. 10.1016/j.ajog.2012.12.035 - DOI - PubMed

LinkOut - more resources