An advanced machine learning method for simultaneous breast cancer risk prediction and risk ranking in Chinese population: A prospective cohort and modeling study
- PMID: 38403898
- PMCID: PMC11374254
- DOI: 10.1097/CM9.0000000000002891
An advanced machine learning method for simultaneous breast cancer risk prediction and risk ranking in Chinese population: A prospective cohort and modeling study
Abstract
Background: Breast cancer (BC) risk-stratification tools for Asian women that are highly accurate and can provide improved interpretation ability are lacking. We aimed to develop risk-stratification models to predict long- and short-term BC risk among Chinese women and to simultaneously rank potential non-experimental risk factors.
Methods: The Breast Cancer Cohort Study in Chinese Women, a large ongoing prospective dynamic cohort study, includes 122,058 women aged 25-70 years old from the eastern part of China. We developed multiple machine-learning risk prediction models using parametric models (penalized logistic regression, bootstrap, and ensemble learning), which were the short-term ensemble penalized logistic regression (EPLR) risk prediction model and the ensemble penalized long-term (EPLT) risk prediction model to estimate BC risk. The models were assessed based on calibration and discrimination, and following this assessment, they were externally validated in new study participants from 2017 to 2020.
Results: The AUC values of the short-term EPLR risk prediction model were 0.800 for the internal validation and 0.751 for the external validation set. For the long-term EPLT risk prediction model, the area under the receiver operating characteristic curve was 0.692 and 0.760 in internal and external validations, respectively. The net reclassification improvement index of the EPLT relative to the Gail and the Han Chinese Breast Cancer Prediction Model (HCBCP) models for external validation was 0.193 and 0.233, respectively, indicating that the EPLT model has higher classification accuracy.
Conclusions: We developed the EPLR and EPLT models to screen populations with a high risk of developing BC. These can serve as useful tools to aid in risk-stratified screening and BC prevention.
Copyright © 2024 The Chinese Medical Association, produced by Wolters Kluwer, Inc. under the CC-BY-NC-ND license.
Conflict of interest statement
None.
Figures




Similar articles
-
Electronic Health Record-Based Absolute Risk Prediction Model for Esophageal Cancer in the Chinese Population: Model Development and External Validation.JMIR Public Health Surveill. 2023 Mar 15;9:e43725. doi: 10.2196/43725. JMIR Public Health Surveill. 2023. PMID: 36781293 Free PMC article.
-
Development and external validation of a breast cancer absolute risk prediction model in Chinese population.Breast Cancer Res. 2021 May 29;23(1):62. doi: 10.1186/s13058-021-01439-2. Breast Cancer Res. 2021. PMID: 34051827 Free PMC article.
-
Machine learning models for prediction of lymph node metastasis in patients with gastric cancer: a Chinese single-centre study with external validation in an Asian American population.BMJ Open. 2025 Mar 25;15(3):e098476. doi: 10.1136/bmjopen-2024-098476. BMJ Open. 2025. PMID: 40132850 Free PMC article.
-
Development and evaluation of a risk assessment tool for the personalized screening of breast cancer in Chinese populations: A prospective cohort study.Cancer. 2024 Apr 15;130(S8):1403-1414. doi: 10.1002/cncr.35095. Epub 2023 Nov 2. Cancer. 2024. PMID: 37916832
-
Interpretable Machine Learning to Predict the Malignancy Risk of Follicular Thyroid Neoplasms in Extremely Unbalanced Data: Retrospective Cohort Study and Literature Review.JMIR Cancer. 2025 Feb 10;11:e66269. doi: 10.2196/66269. JMIR Cancer. 2025. PMID: 39930991 Free PMC article. Review.
Cited by
-
Harnessing Artificial Intelligence to Enhance Global Breast Cancer Care: A Scoping Review of Applications, Outcomes, and Challenges.Cancers (Basel). 2025 Jan 9;17(2):197. doi: 10.3390/cancers17020197. Cancers (Basel). 2025. PMID: 39857979 Free PMC article. Review.
References
-
- Sung H Ferlay J Siegel RL Laversanne M Soerjomataram I Jemal A, et al. . Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 2021;71: 209–249. doi: 10.3322/caac.21660. - PubMed
-
- Cao M, Chen W. Epidemiology of cancer in China and the current status of prevention and control (in Chinese). Chin J Clin Oncol 2019;24: 145–149. doi: 10.3969/j.issn.1000-8179.2019.03.283
-
- Gail MH Brinton LA Byar DP Corle DK Green SB Schairer C, et al. . Projecting individualized probabilities of developing breast cancer for white females who are being examined annually. J Natl Cancer Inst 1989;81: 1879–1886. doi: 10.1093/jnci/81.24.1879. - PubMed
-
- Meads C, Ahmed I, Riley RD. A systematic review of breast cancer incidence risk prediction models with meta-analysis of their performance. Breast Cancer Res Treat 2012;132: 365–377. doi: 10.1007/s10549-011-1818-2. - PubMed
MeSH terms
LinkOut - more resources
Full Text Sources
Medical