Proposed Comprehensive Methodology Integrated with Explainable Artificial Intelligence for Prediction of Possible Biomarkers in Metabolomics Panel of Plasma Samples for Breast Cancer Detection
- PMID: 40282875
- PMCID: PMC12029019
- DOI: 10.3390/medicina61040581
Proposed Comprehensive Methodology Integrated with Explainable Artificial Intelligence for Prediction of Possible Biomarkers in Metabolomics Panel of Plasma Samples for Breast Cancer Detection
Abstract
Aim: Breast cancer (BC) is the most common type of cancer in women, accounting for more than 30% of new female cancers each year. Although various treatments are available for BC, most cancer-related deaths are due to incurable metastases. Therefore, the early diagnosis and treatment of BC are crucial before metastasis. Mammography and ultrasonography are primarily used in the clinic for the initial identification and staging of BC; these methods are useful for general screening but have limitations in terms of sensitivity and specificity. Omics-based biomarkers, like metabolomics, can make early diagnosis much more accurate, make tracking the disease's progression more accurate, and help make personalized treatment plans that are tailored to each tumor's specific molecular profile. Metabolomics technology is a feasible and comprehensive method for early disease detection and biomarker identification at the molecular level. This research aimed to establish an interpretable predictive artificial intelligence (AI) model using plasma-based metabolomics panel data to identify potential biomarkers that distinguish BC individuals from healthy controls. Methods: A cohort of 138 BC patients and 76 healthy controls were studied. Plasma metabolites were examined using LC-TOFMS and GC-TOFMS techniques. Extreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LightGBM), Adaptive Boosting (AdaBoost), and Random Forest (RF) were evaluated using performance metrics such as Receiver Operating Characteristic-Area Under the Curve (ROC AUC), accuracy, sensitivity, specificity, and F1 score. ROC and Precision-Recall (PR) curves were generated for comparative analysis. The SHapley Additive Descriptions (SHAP) analysis evaluated the optimal prediction model for interpretability. Results: The RF algorithm showed improved accuracy (0.963 ± 0.043) and sensitivity (0.977 ± 0.051); however, LightGBM achieved the highest ROC AUC (0.983 ± 0.028). RF also achieved the best Precision-Recall Area under the Curve (PR AUC) at 0.989. SHAP search found glycerophosphocholine and pentosidine as the most significant discriminatory metabolites. Uracil, glutamine, and butyrylcarnitine were also among the significant metabolites. Conclusions: Metabolomics biomarkers and an explainable AI (XAI)-based prediction model showed significant diagnostic accuracy and sensitivity in the detection of BC. The proposed XAI system using interpretable metabolite data can serve as a clinical decision support tool to improve early diagnosis processes.
Keywords: biomarker; breast cancer; explainable artificial intelligence; machine learning; metabolomics.
Conflict of interest statement
The authors declare no conflicts of interest.
Figures




Similar articles
-
Interpretable Machine Learning for Serum-Based Metabolomics in Breast Cancer Diagnostics: Insights from Multi-Objective Feature Selection-Driven LightGBM-SHAP Models.Medicina (Kaunas). 2025 Jun 19;61(6):1112. doi: 10.3390/medicina61061112. Medicina (Kaunas). 2025. PMID: 40572800 Free PMC article.
-
Cancer Metastasis Prediction and Genomic Biomarker Identification through Machine Learning and eXplainable Artificial Intelligence in Breast Cancer Research.Diagnostics (Basel). 2023 Oct 26;13(21):3314. doi: 10.3390/diagnostics13213314. Diagnostics (Basel). 2023. PMID: 37958210 Free PMC article.
-
Explainable Boosting Machines Identify Key Metabolomic Biomarkers in Rheumatoid Arthritis.Medicina (Kaunas). 2025 Apr 30;61(5):833. doi: 10.3390/medicina61050833. Medicina (Kaunas). 2025. PMID: 40428791 Free PMC article.
-
Explainable artificial intelligence in breast cancer detection and risk prediction: A systematic scoping review.Cancer Innov. 2024 Jul 3;3(5):e136. doi: 10.1002/cai2.136. eCollection 2024 Oct. Cancer Innov. 2024. PMID: 39430216 Free PMC article.
-
Artificial Intelligence (AI) for the early detection of breast cancer: a scoping review to assess AI's potential in breast screening practice.Expert Rev Med Devices. 2019 May;16(5):351-362. doi: 10.1080/17434440.2019.1610387. Epub 2019 May 3. Expert Rev Med Devices. 2019. PMID: 30999781
Cited by
-
The Digital Transformation of Healthcare Through Intelligent Technologies: A Path Dependence-Augmented-Unified Theory of Acceptance and Use of Technology Model for Clinical Decision Support Systems.Healthcare (Basel). 2025 May 22;13(11):1222. doi: 10.3390/healthcare13111222. Healthcare (Basel). 2025. PMID: 40508836 Free PMC article.
-
A comprehensive review on computational metabolomics: Advancing multiscale analysis through in-silico approaches.Comput Struct Biotechnol J. 2025 Jul 13;27:3191-3215. doi: 10.1016/j.csbj.2025.07.016. eCollection 2025. Comput Struct Biotechnol J. 2025. PMID: 40735430 Free PMC article. Review.
References
-
- Skaane P. Studies comparing screen-film mammography and full-field digital mammography in breast cancer screening: Updated review. Acta Radiol. 2009;50:3–14. - PubMed
-
- Böhm D., Keller K., Wehrwein N., Lebrecht A., Schmidt M., Kölbl H., Grus F.-H. Serum proteome profiling of primary breast cancer indicates a specific biomarker profile. Oncol. Rep. 2011;26:1051–1056. - PubMed
-
- Sree S.V., Ng E.Y.-K., Acharya U R., Tan W. Breast imaging systems: A review and comparative study. J. Mech. Med. Biol. 2010;10:5–34.
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Medical
Research Materials
Miscellaneous