An advanced machine learning method for simultaneous breast cancer risk prediction and risk ranking in Chinese population: A prospective cohort and modeling study

doi:10.1097/CM9.0000000000002891

. 2024 Sep 5;137(17):2084-2091.

doi: 10.1097/CM9.0000000000002891. Epub 2024 Feb 26.

An advanced machine learning method for simultaneous breast cancer risk prediction and risk ranking in Chinese population: A prospective cohort and modeling study

Liyuan Liu^{1

2}, Yong He^{2

3}, Chunyu Kao³, Yeye Fan², Fu Yang³, Fei Wang^{1

4}, Lixiang Yu^{1

4}, Fei Zhou^{1

4}, Yujuan Xiang^{1

4}, Shuya Huang^{1

4}, Chao Zheng^{1

4}, Han Cai^{1

4}, Heling Bao⁵, Liwen Fang⁶, Linhong Wang⁶, Zengjing Chen², Zhigang Yu^{1

4}

Affiliations

¹ Department of Breast Surgery, The Second Hospital, Cheeloo College of Medicine, Shandong University, Jinan, Shandong 250033, China.
² School of Mathematics, Shandong University, Jinan, Shandong 250100, China.
³ Zhongtai Securities Institute for Financial Studies, Shandong University, Jinan, Shandong 250100, China.
⁴ Institute of Translational Medicine of Breast Disease Prevention and Treatment, Shandong University, Jinan, Shandong 250033, China.
⁵ Department of Maternal and Child Health, School of Public Health, Peking University, Haidian District, Beijing 100191, China.
⁶ National Center for Chronic and Non-communicable Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing 100050, China.

PMID: 38403898
PMCID: PMC11374254
DOI: 10.1097/CM9.0000000000002891

An advanced machine learning method for simultaneous breast cancer risk prediction and risk ranking in Chinese population: A prospective cohort and modeling study

Liyuan Liu et al. Chin Med J (Engl). 2024.

. 2024 Sep 5;137(17):2084-2091.

doi: 10.1097/CM9.0000000000002891. Epub 2024 Feb 26.

Authors

Affiliations

¹ Department of Breast Surgery, The Second Hospital, Cheeloo College of Medicine, Shandong University, Jinan, Shandong 250033, China.
² School of Mathematics, Shandong University, Jinan, Shandong 250100, China.
³ Zhongtai Securities Institute for Financial Studies, Shandong University, Jinan, Shandong 250100, China.
⁴ Institute of Translational Medicine of Breast Disease Prevention and Treatment, Shandong University, Jinan, Shandong 250033, China.
⁵ Department of Maternal and Child Health, School of Public Health, Peking University, Haidian District, Beijing 100191, China.
⁶ National Center for Chronic and Non-communicable Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing 100050, China.

PMID: 38403898
PMCID: PMC11374254
DOI: 10.1097/CM9.0000000000002891

Abstract

Background: Breast cancer (BC) risk-stratification tools for Asian women that are highly accurate and can provide improved interpretation ability are lacking. We aimed to develop risk-stratification models to predict long- and short-term BC risk among Chinese women and to simultaneously rank potential non-experimental risk factors.

Methods: The Breast Cancer Cohort Study in Chinese Women, a large ongoing prospective dynamic cohort study, includes 122,058 women aged 25-70 years old from the eastern part of China. We developed multiple machine-learning risk prediction models using parametric models (penalized logistic regression, bootstrap, and ensemble learning), which were the short-term ensemble penalized logistic regression (EPLR) risk prediction model and the ensemble penalized long-term (EPLT) risk prediction model to estimate BC risk. The models were assessed based on calibration and discrimination, and following this assessment, they were externally validated in new study participants from 2017 to 2020.

Results: The AUC values of the short-term EPLR risk prediction model were 0.800 for the internal validation and 0.751 for the external validation set. For the long-term EPLT risk prediction model, the area under the receiver operating characteristic curve was 0.692 and 0.760 in internal and external validations, respectively. The net reclassification improvement index of the EPLT relative to the Gail and the Han Chinese Breast Cancer Prediction Model (HCBCP) models for external validation was 0.193 and 0.233, respectively, indicating that the EPLT model has higher classification accuracy.

Conclusions: We developed the EPLR and EPLT models to screen populations with a high risk of developing BC. These can serve as useful tools to aid in risk-stratified screening and BC prevention.

PubMed Disclaimer

Conflict of interest statement

None.

Figures

**Figure 1**
The flow chart of invoking the BCCS-CW database. The BCCS-CW database was divided into different databases according to provinces, to train, validate, and test the EPLR and EPLT models. BCCS-CW: Breast Cancer Cohort Study in Chinese Women; EPLR: Ensemble penalized logistic regression; EPLT: Ensemble penalized long-term.

**Figure 2**
ROCs for EPLR model. (A) ROC curves for 51-factor and 72-factor EPLR models. ROC curves show performance of the 72-factor-EPLR model using both the internal validation set (green) and external validation set (yellow). ROC curves show performance of the 51-factor-EPLR model using both the internal validation set (red) and external validation set (blue). The AUC was improved in absolute terms by 4.5% and 8.5% using the 72-factor-EPLR model compared with the 51-factor-EPLR model in the internal and external validation sets, respectively. (B) ROC curves showing performance of the EPLR model using both the internal validation set (green) and external validation set (yellow). ROC curves showing performance of the BCRAM using both the internal validation set (green) and external validation set (blue). The AUC was improved in absolute terms by 10.9% and 6.8% using the EPLR model compared with the BCRAM in the internal and external validation sets, respectively. *Indicates the external validation set. AUC: Area under the receiver operating characteristic curve. EPLR: Ensemble penalized logistic regression; ROC: Receiver operating characteristic.

**Figure 3**
ROC curves and calibration plots for EPLT, Gail, and HCBCP models. Orange, blue, and green represent curves or plots of EPLT, Gail, and HCBCP models, respectively. (A) ROC curves for internal validation set and (B) external validation set; (C) calibration plots for internal validation set and (D) external validation set. EPLT: Ensemble penalized long-term; ROC: Receiver operating characteristic.

**Figure 4**
Score of importance for each risk factor. Number of occurrences of 72 risk factors in 200 PLR models were trained on data from all Shandong Province, to quantify the impact of risk factors on breast cancer incidence. BMI: Body mass index; PLR: Penalized logistic regression; WHR: Weight-to-height ratio.

See this image and copyright information in PMC

Cited by

Harnessing Artificial Intelligence to Enhance Global Breast Cancer Care: A Scoping Review of Applications, Outcomes, and Challenges.
Chia JLL, He GS, Ngiam KY, Hartman M, Ng QX, Goh SSN. Chia JLL, et al. Cancers (Basel). 2025 Jan 9;17(2):197. doi: 10.3390/cancers17020197. Cancers (Basel). 2025. PMID: 39857979 Free PMC article. Review.

References

1. Sung H Ferlay J Siegel RL Laversanne M Soerjomataram I Jemal A, et al. . Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 2021;71: 209–249. doi: 10.3322/caac.21660. - PubMed
1. Xia C, Dong X, Li H, Cao M, Sun D, He S, Yang F, Yan X, Zhang S, Li N, Chen W. Cancer statistics in China and United States, 2022: profiles, trends, and determinants. Chin Med J 2022;135: 584–590. doi: 10.1097/CM9.0000000000002108. - PMC - PubMed
1. Cao M, Chen W. Epidemiology of cancer in China and the current status of prevention and control (in Chinese). Chin J Clin Oncol 2019;24: 145–149. doi: 10.3969/j.issn.1000-8179.2019.03.283
1. Gail MH Brinton LA Byar DP Corle DK Green SB Schairer C, et al. . Projecting individualized probabilities of developing breast cancer for white females who are being examined annually. J Natl Cancer Inst 1989;81: 1879–1886. doi: 10.1093/jnci/81.24.1879. - PubMed
1. Meads C, Ahmed I, Riley RD. A systematic review of breast cancer incidence risk prediction models with meta-analysis of their performance. Breast Cancer Res Treat 2012;132: 365–377. doi: 10.1007/s10549-011-1818-2. - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information

[1] Sung H Ferlay J Siegel RL Laversanne M Soerjomataram I Jemal A, et al. . Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 2021;71: 209–249. doi: 10.3322/caac.21660. - PubMed

[2] Sung H Ferlay J Siegel RL Laversanne M Soerjomataram I Jemal A, et al. . Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 2021;71: 209–249. doi: 10.3322/caac.21660. - PubMed

[3] Xia C, Dong X, Li H, Cao M, Sun D, He S, Yang F, Yan X, Zhang S, Li N, Chen W. Cancer statistics in China and United States, 2022: profiles, trends, and determinants. Chin Med J 2022;135: 584–590. doi: 10.1097/CM9.0000000000002108. - PMC - PubMed

[4] Xia C, Dong X, Li H, Cao M, Sun D, He S, Yang F, Yan X, Zhang S, Li N, Chen W. Cancer statistics in China and United States, 2022: profiles, trends, and determinants. Chin Med J 2022;135: 584–590. doi: 10.1097/CM9.0000000000002108. - PMC - PubMed

[5] Cao M, Chen W. Epidemiology of cancer in China and the current status of prevention and control (in Chinese). Chin J Clin Oncol 2019;24: 145–149. doi: 10.3969/j.issn.1000-8179.2019.03.283

[6] Cao M, Chen W. Epidemiology of cancer in China and the current status of prevention and control (in Chinese). Chin J Clin Oncol 2019;24: 145–149. doi: 10.3969/j.issn.1000-8179.2019.03.283

[7] Gail MH Brinton LA Byar DP Corle DK Green SB Schairer C, et al. . Projecting individualized probabilities of developing breast cancer for white females who are being examined annually. J Natl Cancer Inst 1989;81: 1879–1886. doi: 10.1093/jnci/81.24.1879. - PubMed

[8] Gail MH Brinton LA Byar DP Corle DK Green SB Schairer C, et al. . Projecting individualized probabilities of developing breast cancer for white females who are being examined annually. J Natl Cancer Inst 1989;81: 1879–1886. doi: 10.1093/jnci/81.24.1879. - PubMed

[9] Meads C, Ahmed I, Riley RD. A systematic review of breast cancer incidence risk prediction models with meta-analysis of their performance. Breast Cancer Res Treat 2012;132: 365–377. doi: 10.1007/s10549-011-1818-2. - PubMed

[10] Meads C, Ahmed I, Riley RD. A systematic review of breast cancer incidence risk prediction models with meta-analysis of their performance. Breast Cancer Res Treat 2012;132: 365–377. doi: 10.1007/s10549-011-1818-2. - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

An advanced machine learning method for simultaneous breast cancer risk prediction and risk ranking in Chinese population: A prospective cohort and modeling study

Affiliations

An advanced machine learning method for simultaneous breast cancer risk prediction and risk ranking in Chinese population: A prospective cohort and modeling study

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

MeSH terms

LinkOut - more resources

Full Text Sources

Medical

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

MeSH terms

Related information

LinkOut - more resources

Full Text Sources

Medical