Comparison of Machine Learning Models for Classification of Breast Cancer Risk Based on Clinical Data

Affiliations

¹ Cancer Biology Research Center, Cancer Institute, Tehran University of Medical Sciences, Tehran, Iran.
² School of Industrial Engineering, Iran University of Science and Technology, Tehran, Iran.
³ Department of Industrial Engineering, Iran University of Science and Technology, Tehran, Iran.
⁴ Osteoporosis Research Center, Endocrinology and Metabolism Research Institute, Tehran University of Medical Sciences, Tehran, Iran.
⁵ School of Medicine, Tehran University of Medical Science, Tehran, Iran.
⁶ Faculty of Mechanical Engineering, K. N. Toosi University of Technology, Tehran, Iran.
⁷ School of Mechanical Engineering, University of Tehran, Tehran, Iran.
⁸ Department of Mechanical Engineering, School of Engineering, University of Birmingham, Birmingham, UK.
⁹ Department of Computing, School of Digital, Technologies and Arts, Staffordshire University, Stoke-on-Trent, UK.

PMID: 40176498
PMCID: PMC11965882
DOI: 10.1002/cnr2.70175

Comparative Study

Comparison of Machine Learning Models for Classification of Breast Cancer Risk Based on Clinical Data

Haniyeh Rafiepoor et al. Cancer Rep (Hoboken). 2025 Apr.

. 2025 Apr;8(4):e70175.

doi: 10.1002/cnr2.70175.

Affiliations

¹ Cancer Biology Research Center, Cancer Institute, Tehran University of Medical Sciences, Tehran, Iran.
² School of Industrial Engineering, Iran University of Science and Technology, Tehran, Iran.
³ Department of Industrial Engineering, Iran University of Science and Technology, Tehran, Iran.
⁴ Osteoporosis Research Center, Endocrinology and Metabolism Research Institute, Tehran University of Medical Sciences, Tehran, Iran.
⁵ School of Medicine, Tehran University of Medical Science, Tehran, Iran.
⁶ Faculty of Mechanical Engineering, K. N. Toosi University of Technology, Tehran, Iran.
⁷ School of Mechanical Engineering, University of Tehran, Tehran, Iran.
⁸ Department of Mechanical Engineering, School of Engineering, University of Birmingham, Birmingham, UK.
⁹ Department of Computing, School of Digital, Technologies and Arts, Staffordshire University, Stoke-on-Trent, UK.

PMID: 40176498
PMCID: PMC11965882
DOI: 10.1002/cnr2.70175

Abstract

Background: Breast cancer (BC) is a major global health concern with rising incidence and mortality rates in many developing countries. Effective BC risk assessment models are crucial for prevention and early detection. While the Gail model, a traditional logistic regression-based model, has been broadly used, its predictive performance may be limited by its linear assumptions. With the rapid advancement of artificial intelligence (AI) in medical sciences, various complex machine learning algorithms have been developed for risk prediction, including for BC.

Aims: This study aims to compare the quality of AI-based models with the traditional Gail model in assessing BC risk using a population dataset. It also evaluates the performance of these models in predicting BC risk.

Methods and results: This study involved 942 newly diagnosed BC patients and 975 healthy controls at the Cancer Institute in IKH hospital Complex, Tehran. Ten classification algorithms were applied to the dataset. The accuracy, sensitivity, precision, and feature importance in the machine learning algorithms were assessed and compared to previous studies for evaluation. The study found that AI algorithms alone did not significantly improve predictability compared to the Gail model. However, the importance of variables varied significantly among the AI algorithms. Understanding feature importance and interactions is crucial in AI modeling in order to enhance accuracy and identify critical risk factors.

Conclusion: This study concluded that, in BC risk prediction, incorporating specific risk factors, such as genetic and image-related variables, may be necessary to further enhance accuracy in BC risk prediction models. Furthermore, it is crucial to address modeling issues in models with a restricted number of features for future research.

Keywords: artificial intelligence; breast cancer; conventional models; machine learning; risk assessment.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflicts of interest.

Figures

**FIGURE 1**
ROC curves for all algorithms on the validation set. The highest validation accuracy was related to gradient boosting (AUC = 0.65).

**FIGURE 2**
Prediction partition analysis of breast cancer risk prediction. Red: Cases, blue: Controls.

See this image and copyright information in PMC

References

1. Sung H., Ferlay J., Siegel R. L., et al., “Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries,” CA: A Cancer Journal for Clinicians 71, no. 3 (2021): 209–249. - PubMed
1. Bray F., Ferlay J., Soerjomataram I., Siegel R. L., Torre L. A., and Jemal A., “Global Cancer Statistics 2018: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries,” CA: A Cancer Journal for Clinicians 68, no. 6 (2018): 394–424. - PubMed
1. Gail M. H., Brinton L. A., Byar D. P., et al., “Projecting Individualized Probabilities of Developing Breast Cancer for White Females Who Are Being Examined Annually,” Journal of the National Cancer Institute 81, no. 24 (1989): 1879–1886. - PubMed
1. Zhao Y., Wang X., Wang Y., and Zhu Z., “Logistic Regression Analysis and a Risk Prediction Model of Pneumothorax After CT‐Guided Needle Biopsy,” Journal of Thoracic Disease 9, no. 11 (2017): 4750–4757. - PMC - PubMed
1. Schober P. and Vetter T. R., “Logistic Regression in Medical Research,” Anesthesia and Analgesia 132, no. 2 (2021): 365–366. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Comparison of Machine Learning Models for Classification of Breast Cancer Risk Based on Clinical Data

Affiliations

Comparison of Machine Learning Models for Classification of Breast Cancer Risk Based on Clinical Data

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Medical