Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Aug 13;25(1):1311.
doi: 10.1186/s12885-025-14664-1.

Identification of optimal biomarkers associated with distant metastasis in breast cancer using Boruta and Lasso machine learning algorithms

Affiliations

Identification of optimal biomarkers associated with distant metastasis in breast cancer using Boruta and Lasso machine learning algorithms

Jia-Ning Qin et al. BMC Cancer. .

Abstract

Objective: The aim of this study was to identify optimal biomarkers associated with distant metastasis in patients with breast cancer from among nutritional and inflammatory indicators using the Boruta and Least Absolute Shrinkage and Selection Operator (LASSO) machine learning algorithms, thereby improving the ability to identify distant metastasis.

Methods: A total of 348 patients newly diagnosed with breast cancer were included, comprising 185 patients with nonmetastatic breast cancer and 163 patients with distant metastatic breast cancer. The variables were initially screened using the Boruta algorithm, followed by further optimization through LASSO regression. The selected key indicators were evaluated for their association with distant metastasis risk using multivariate logistic regression analysis and restricted cubic spline functions. Discriminative performance was assessed through ROC curve analysis.

Results: Boruta and LASSO analyses identified five important indicators: the advanced lung cancer inflammation index (ALI), systemic inflammation response index (SIRI), monocyte-to-lymphocyte ratio (MLR), albumin-to-globulin ratio (AGR), and geriatric nutritional risk index (GNRI). Multivariate logistic regression analysis revealed that an elevated SIRI and MLR were associated with an increased risk of distant metastasis in patients with breast cancer, whereas a higher ALI, AGR, and GNRI were associated with a reduced risk. ROC analysis indicated moderate predictive performance for these indicators, with AUC values of approximately 0.65.

Conclusion: The ALI, SIRI, MLR, AGR, and GNRI are effective biomarkers for identifying the risk of distant metastasis in patients with breast cancer. These indicators could be incorporated into clinical practice to improve risk stratification, guide personalized treatment, and enhance patient outcomes.

Keywords: Biomarker; Boruta; Breast cancer; Distant metastasis; LASSO.

PubMed Disclaimer

Conflict of interest statement

Declarations. Ethics approval and consent to participate: This study was conducted in accordance with the ethical standards of the institutional and/or national research committee and with the principles of the Declaration of Helsinki ( https://www.wma.net/policies-post/wma-declaration-of-helsinki/ ). Ethical approval was obtained from the Medical Ethics Committee of Guangxi Medical University Cancer Hospital (Reference Number: KY2023868). Given the retrospective nature of the study, the requirement for informed consent was waived by the Medical Ethics Committee of Guangxi Medical University Cancer Hospital. Consent for publication: Not applicable. Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Spearman correlation heatmap of inflammation and nutritional indices. Red indicates positive correlations, and blue indicates negative correlations between the indices and the risk of distant metastasis in patients with breast cancer
Fig. 2
Fig. 2
Using two machine learning algorithms to select indicators related to distant metastasis in patients with breast cancer. Initial variable selection using the Boruta algorithm. The results indicate that the AGR, mGNRI, GNRI, ALB concentration, ALI, MLR, and SIRI are important factors for predicting distant metastasis in patients with breast cancer (A). Variable refinement through LASSO regression. The λ value determined by the 1-SE rule was selected as the optimal parameter, resulting in the identification of 5 important variables: the ALI, SIRI, MLR, AGR, and the GNRI (B-C). ALI, advanced lung cancer inflammatory index; SIRI, systemic inflammatory response index; MLR, monocyte-to-lymphocyte ratio; AGR, albumin-to-globulin ratio; GNRI, geriatric nutritional risk index
Fig. 3
Fig. 3
Relationships of the ALI, SIRI, MLR, AGR, and GRNI with the risk of distant metastasis in patients with breast cancer. Standardized data and quartiles were used in the logistic regression model, and trend tests were conducted. Model 1 was a crude model. Model 2 was adjusted for age, education, insurance status, hypertension status, diabetes status, marital status, location, and menstrual status. Model 3 was adjusted for age, education, insurance status, hypertension status, diabetes status, marital status, location, menstrual status, pathological type, histological grade and subtype. ALI, advanced lung cancer inflammatory index; SIRI, systemic inflammatory response index; MLR, monocyte-to-lymphocyte ratio; AGR, albumin-to-globulin ratio; GNRI, geriatric nutritional risk index
Fig. 4
Fig. 4
Relationships of the ALI, SIRI, MLR, AGR, and GRNI with distant metastasis in patients with breast cancer were analyzed using the RCS function (A-E). The model was set with three knots located at the 10th, 50th, and 90th percentiles. The Y-axis represents the odds ratio (OR) for distant metastatic breast cancer for any value compared with the reference value (50th percentile). The logistic regression model was adjusted for age, education, insurance status, hypertension status, diabetes status, marital status, location, menstrual status, pathological type, histological grade, and subtype. ALI, advanced lung cancer inflammatory index; SIRI, systemic inflammatory response index; MLR, monocyte-to-lymphocyte ratio; AGR, albumin-to-globulin ratio; GNRI, geriatric nutritional risk index
Fig. 5
Fig. 5
ROC curves of the ALI, SIRI, MLR, AGR, and GRNI for the prediction of distant metastasis in breast cancer patients. ALI, advanced lung cancer inflammatory index; SIRI, systemic inflammatory response index; MLR, monocyte-to-lymphocyte ratio; AGR, albumin-to-globulin ratio; GNRI, geriatric nutritional risk index

Similar articles

References

    1. Bray F, Laversanne M, Sung H, Ferlay J, Siegel R, Soerjomataram I, Jemal A. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2024;74(3):229–63. 10.3322/caac.21834. - PubMed
    1. Cao W, Chen HD, Yu YW, Li N, Chen WQ. Changing profiles of cancer burden worldwide and in China: a secondary analysis of the global cancer statistics 2020. Chin Med J (Engl). 2021;134(7):783–91. 10.1097/CM9.0000000000001474. - PMC - PubMed
    1. Siegel RL, Miller KD, Jemal A. Cancer statistics, 2016. CA Cancer J Clin. 2016;66(1):7–30. 10.3322/caac.21332. - PubMed
    1. Adisa AO, Arowolo OA, Akinkuolie AA, Titiloye NA, Alatise OI, Lawal OO, Adesunkanmi ARK. Metastatic breast cancer in a Nigerian tertiary hospital. Afr Health Sci. 2011;11(2):279–84. - PMC - PubMed
    1. Gogate A, Wheeler S, Reeder-Hayes K, Ekwueme D, Fairley T, Drier S, Trogdon J. Projecting the prevalence and costs of metastatic breast Cancer from 2015 through 2030. JNCI Cancer Spectr. 2021;5(4):pkab063. 10.1093/jncics/pkab063. - PMC - PubMed

LinkOut - more resources