Identification of optimal biomarkers associated with distant metastasis in breast cancer using Boruta and Lasso machine learning algorithms
- PMID: 40804615
- PMCID: PMC12345036
- DOI: 10.1186/s12885-025-14664-1
Identification of optimal biomarkers associated with distant metastasis in breast cancer using Boruta and Lasso machine learning algorithms
Abstract
Objective: The aim of this study was to identify optimal biomarkers associated with distant metastasis in patients with breast cancer from among nutritional and inflammatory indicators using the Boruta and Least Absolute Shrinkage and Selection Operator (LASSO) machine learning algorithms, thereby improving the ability to identify distant metastasis.
Methods: A total of 348 patients newly diagnosed with breast cancer were included, comprising 185 patients with nonmetastatic breast cancer and 163 patients with distant metastatic breast cancer. The variables were initially screened using the Boruta algorithm, followed by further optimization through LASSO regression. The selected key indicators were evaluated for their association with distant metastasis risk using multivariate logistic regression analysis and restricted cubic spline functions. Discriminative performance was assessed through ROC curve analysis.
Results: Boruta and LASSO analyses identified five important indicators: the advanced lung cancer inflammation index (ALI), systemic inflammation response index (SIRI), monocyte-to-lymphocyte ratio (MLR), albumin-to-globulin ratio (AGR), and geriatric nutritional risk index (GNRI). Multivariate logistic regression analysis revealed that an elevated SIRI and MLR were associated with an increased risk of distant metastasis in patients with breast cancer, whereas a higher ALI, AGR, and GNRI were associated with a reduced risk. ROC analysis indicated moderate predictive performance for these indicators, with AUC values of approximately 0.65.
Conclusion: The ALI, SIRI, MLR, AGR, and GNRI are effective biomarkers for identifying the risk of distant metastasis in patients with breast cancer. These indicators could be incorporated into clinical practice to improve risk stratification, guide personalized treatment, and enhance patient outcomes.
Keywords: Biomarker; Boruta; Breast cancer; Distant metastasis; LASSO.
© 2025. The Author(s).
Conflict of interest statement
Declarations. Ethics approval and consent to participate: This study was conducted in accordance with the ethical standards of the institutional and/or national research committee and with the principles of the Declaration of Helsinki ( https://www.wma.net/policies-post/wma-declaration-of-helsinki/ ). Ethical approval was obtained from the Medical Ethics Committee of Guangxi Medical University Cancer Hospital (Reference Number: KY2023868). Given the retrospective nature of the study, the requirement for informed consent was waived by the Medical Ethics Committee of Guangxi Medical University Cancer Hospital. Consent for publication: Not applicable. Competing interests: The authors declare no competing interests.
Figures





Similar articles
-
Does the Presence of Missing Data Affect the Performance of the SORG Machine-learning Algorithm for Patients With Spinal Metastasis? Development of an Internet Application Algorithm.Clin Orthop Relat Res. 2024 Jan 1;482(1):143-157. doi: 10.1097/CORR.0000000000002706. Epub 2023 Jun 12. Clin Orthop Relat Res. 2024. PMID: 37306629 Free PMC article.
-
Are Current Survival Prediction Tools Useful When Treating Subsequent Skeletal-related Events From Bone Metastases?Clin Orthop Relat Res. 2024 Sep 1;482(9):1710-1721. doi: 10.1097/CORR.0000000000003030. Epub 2024 Mar 22. Clin Orthop Relat Res. 2024. PMID: 38517402
-
Interpretable machine learning analysis of immunoinflammatory biomarkers for predicting CHD among NAFLD patients.Cardiovasc Diabetol. 2025 Jul 3;24(1):263. doi: 10.1186/s12933-025-02818-1. Cardiovasc Diabetol. 2025. PMID: 40611098 Free PMC article.
-
Cost-effectiveness of using prognostic information to select women with breast cancer for adjuvant systemic therapy.Health Technol Assess. 2006 Sep;10(34):iii-iv, ix-xi, 1-204. doi: 10.3310/hta10340. Health Technol Assess. 2006. PMID: 16959170
-
[Volume and health outcomes: evidence from systematic reviews and from evaluation of Italian hospital data].Epidemiol Prev. 2013 Mar-Jun;37(2-3 Suppl 2):1-100. Epidemiol Prev. 2013. PMID: 23851286 Italian.
References
-
- Bray F, Laversanne M, Sung H, Ferlay J, Siegel R, Soerjomataram I, Jemal A. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2024;74(3):229–63. 10.3322/caac.21834. - PubMed
-
- Siegel RL, Miller KD, Jemal A. Cancer statistics, 2016. CA Cancer J Clin. 2016;66(1):7–30. 10.3322/caac.21332. - PubMed
Grants and funding
LinkOut - more resources
Full Text Sources