Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Feb 16:13:1103369.
doi: 10.3389/fonc.2023.1103369. eCollection 2023.

Applications of different machine learning approaches in prediction of breast cancer diagnosis delay

Affiliations

Applications of different machine learning approaches in prediction of breast cancer diagnosis delay

Samira Dehdar et al. Front Oncol. .

Abstract

Background: The increasing rate of breast cancer (BC) incidence and mortality in Iran has turned this disease into a challenge. A delay in diagnosis leads to more advanced stages of BC and a lower chance of survival, which makes this cancer even more fatal.

Objectives: The present study was aimed at identifying the predicting factors for delayed BC diagnosis in women in Iran.

Methods: In this study, four machine learning methods, including extreme gradient boosting (XGBoost), random forest (RF), neural networks (NNs), and logistic regression (LR), were applied to analyze the data of 630 women with confirmed BC. Also, different statistical methods, including chi-square, p-value, sensitivity, specificity, accuracy, and area under the receiver operating characteristic curve (AUC), were utilized in different steps of the survey.

Results: Thirty percent of patients had a delayed BC diagnosis. Of all the patients with delayed diagnoses, 88.5% were married, 72.1% had an urban residency, and 84.8% had health insurance. The top three important factors in the RF model were urban residency (12.04), breast disease history (11.58), and other comorbidities (10.72). In the XGBoost, urban residency (17.54), having other comorbidities (17.14), and age at first childbirth (>30) (13.13) were the top factors; in the LR model, having other comorbidities (49.41), older age at first childbirth (82.57), and being nulliparous (44.19) were the top factors. Finally, in the NN, it was found that being married (50.05), having a marriage age above 30 (18.03), and having other breast disease history (15.83) were the main predicting factors for a delayed BC diagnosis.

Conclusion: Machine learning techniques suggest that women with an urban residency who got married or had their first child at an age older than 30 and those without children are at a higher risk of diagnosis delay. It is necessary to educate them about BC risk factors, symptoms, and self-breast examination to shorten the delay in diagnosis.

Keywords: breast cancer (BC); delay; extreme gradient boosting; logistic regression; machine learning; neural networks (NN); random forest (RF).

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

Figure 1
Figure 1
Steps for building the prediction model and the statistical methods used in each step.
Figure 2
Figure 2
Logistic regression curve.
Figure 3
Figure 3
Receiver operating characteristic (ROC) curves of four applied machine learning (ML) models and the area under the curve (AUC) are specified for each model.

References

    1. Siegel RL, Miller KD, Wagle NS, Jemal A. Cancer statistics, 2023. CA Cancer J Clin (2023) 73(1):17–48. - PubMed
    1. US Cancer Statistics Working Group . United states cancer statistics: 1999–2012 incidence and mortality web-based report. Atlanta (GA: Department of health and human services, centers for disease control and prevention, and national cancer institu) (2015).
    1. Siegel RL, Miller KD, Jemal A. Cancer statistics, 2017. CA. Cancer J Clin (2017) 67(1):7–30. doi: 10.3322/caac.21387 - DOI - PubMed
    1. Bray F, Ferlay J, Soerjomataram I, Siegel RL, Torre LA, Jemal A. “Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries,”. CA. Cancer J Clin (2018) 68(6):394–424. doi: 10.3322/caac.21492 - DOI - PubMed
    1. An Y, Wang J, Zhang L, Zhao H, Gao Z, Huang H, et al. PASCAL: A pseudo cascade learning framework for breast cancer treatment entity normalization in Chinese clinical text. BMC Med Inform. Decis. Mak. (2020) 20(1):204. doi: 10.1186/s12911-020-01216-9 - DOI - PMC - PubMed

LinkOut - more resources