Understanding Cancer Risk Among Bangladeshi Women: An Explainable Machine Learning Approach to Socio-Reproductive Factors Using Tertiary Hospital Data
- PMID: 40565458
- PMCID: PMC12192815
- DOI: 10.3390/healthcare13121432
Understanding Cancer Risk Among Bangladeshi Women: An Explainable Machine Learning Approach to Socio-Reproductive Factors Using Tertiary Hospital Data
Abstract
Background: Breast cancer poses a significant health challenge in Bangladesh, where limited screening and unique reproductive patterns contribute to delayed diagnoses and subtype-specific disparities. While reproductive risk factors such as age at menarche, parity, and contraceptive use are well studied in high-income countries, their associations with hormone-receptor-positive (HR+) and triple-negative breast cancer (TNBC) remain underexplored in low-resource settings.
Methods: A case-control study was conducted at the National Institute of Cancer Research and Hospital (NICRH) including 486 histopathologically confirmed breast cancer cases (246 HR+, 240 TNBC) and 443 cancer-free controls. Socio-demographic and reproductive data were collected through structured interviews. Machine learning models-including Logistic Regression, Lasso, Support Vector Machines, Random Forest, and XGBoost-were trained using stratified five-fold cross-validation. Model performance was evaluated using sensitivity, F1-score, and Area Under Receiver Operating Curve (AUROC). To interpret model predictions and quantify the contribution of individual features, we employed Shapley Additive exPlanation (SHAP) values.
Results: XGBoost achieved the highest overall performance (F1-score = 0.750), and SHAP-based interpretability revealed key predictors for each subtype. Rural residence, low education (≤5 years), and undernutrition were significant predictors across subtypes. Cesarean delivery and multiple abortions were more predictive of TNBC, while urban residence, employment, and higher education were more predictive of HR+. Age at menarche and age at first childbirth showed decreasing predictive importance with increasing age for HR+, while larger gaps between marriage and childbirth were more predictive of TNBC.
Conclusions: Our findings underscore the value of machine learning coupled with SHAP-based explainability in identifying context-specific risk factors for breast cancer subtypes in resource-limited settings. This approach enhances transparency and supports the development of targeted public health interventions to reduce breast cancer disparities in Bangladesh.
Keywords: breast cancer risk; explainable machine learning; women reproductive risk factors.
Conflict of interest statement
The authors declare no conflicts of interest.
Figures
References
-
- Urbanization in Bangladesh the Prevalence of Breast Cancer Brings Unique Challenges—The ASCO Post [Internet] [(accessed on 11 March 2024)]. Available online: https://ascopost.com/issues/october-25-2021/urbanization-in-bangladesh-t...
LinkOut - more resources
Full Text Sources
