Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Feb 5;25(1):206.
doi: 10.1186/s12885-025-13556-8.

Breast cancer risk assessment based on a predictive model: evaluation of risk factors among Japanese women

Affiliations

Breast cancer risk assessment based on a predictive model: evaluation of risk factors among Japanese women

Michiyo Yamada et al. BMC Cancer. .

Abstract

Background: Each breast cancer (BC) risk factor has different effects on different populations. However, there are no well-studied and validated BC risk prediction models for Japanese women. We developed accessible predictive models for Japanese women with optimal variables to evaluate risk factors for use by both medical institutions and women for primary BC prevention and to increase the BC screening rate. We evaluated the characteristics and distribution diversity of risk factors in this population.

Methods: This retrospective case-control study of 2,494 Japanese women included data from an original, paper-based questionnaire. The logistic regression models included 18 variables from 6 risk factors based on menopausal status (PRE, premenopausal; PERI, perimenopausal; and POST, postmenopausal). Models were evaluated based on the Akaike Information Criterion, area under the receiver operating characteristic curve (AUC), and internal validation. Bootstrap methods for bias correction in discrimination and calibration and standard deviations were calculated by the modified case-control ratio.

Results: We created and evaluated 432 candidate models for each group. Notably, BMI, parity, FHx, and smoking history were found to increase risk in all groups. Risk-reducing factors included breastfeeding duration in the PRE and PERI models and regular alcohol consumption in the PERI and POST models. Age reduced risk in the PERI model but increased risk in the POST model. Differences were observed between PRE and PERI versus POST with respect to variable selection in parity and FHx. Our models had moderate discriminatory accuracy. AUCs (confidence intervals) of the PRE, PERI, and POST models were 0.669 (0.625-0.715), 0.669 (0.632-0.702), and 0.659 (0.627-0.693), respectively. Bias-corrected AUCs (standard deviations) were 0.697 (0.041) for PRE, 0.684 (0.033) for PERI, and 0.674 (0.031) for POST, respectively. Our models were well-calibrated after bias correction.

Conclusion: Our widely available, simple, and cost-effective models with optimal variables could indicate the characteristics of certain genetic and environmental risk factors for BC in Japanese women.

Keywords: Breast cancer risk assessment; Diverse alcoholic effects; Japanese women; Late childbearing; Optimal variables; Predictive model.

PubMed Disclaimer

Conflict of interest statement

Declarations. Ethics approval and consent to participate: This study was performed in line with the principles of the Declaration of Helsinki. Approval was granted by the ethics committee of Yokohama City University Hospital on February 1, 2013 (No: B130110035). Consent for publication: Not applicable. Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Flow diagram of dataset composition a Items that had more than 12% missing values were excluded. b Questionnaires that had a blank in the six selected risk factors were excluded. c Distribution of completed questionnaires by patients with breast cancer and healthy controls stratified by age (age at diagnosis for cases or age at the time of the survey for controls) as an estimate of menopausal status: premenopausal (PRE; 20–44), perimenopausal (PERI; 45–55), and postmenopausal (POST; 56–80)
Fig. 2
Fig. 2
ROC curve and calibration plot of the risk model for each group. The upper figures show the ROC curve of the risk model in each group to assess its discrimination. Dotted straight lines indicate AUC of 50%, for reference. The AUC was 0.669 for PRE, 0.669 for PERI, and 0.659 for POST. The lower figures are calibration plots for validating predicted probabilities against actual outcomes for each group. The average number of observations per group on calibration curves was 20 in grouped proportions. Triangles represent grouped observations. Gray thick diagonal lines are the ideal where predicted probability is equal to actual probability. Dashed lines represent the smoothed nonparametric estimates. Solid curves represent the logistic calibration curve of the plots. The histogram represents the ratio of the number of observations at each predicted probability to the total number of observations. AUC, area under the ROC curve; PRE, premenopausal (age, 20–44 years); PERI, perimenopausal (age, 45–55 years); POST, postmenopausal (age, 56–80 years); ROC, receiver operating characteristic
Fig. 3
Fig. 3
ROC curve and calibration plots of the risk model for each group, generated after adjusting for case–control sampling by reweighting cases and controls, using a case–control ratio of 1:9. The upper figures show the ROC curve of the risk model in each group to assess its discrimination. Dotted straight lines indicate AUC of 50%, for reference. The AUC was 0.697 for PRE, 0.684 for PERI, and 0.674 for POST. The lower figures are calibration plots for validating predicted probabilities against actual outcomes for each group. The average number of observations per group on calibration curves was 20 in grouped proportions. Triangles represent grouped observations. Gray thick diagonal lines are the ideal where predicted probability is equal to actual probability. Dashed lines represent the smoothed nonparametric estimates. Solid curves represent the logistic calibration curve of the plots. The histogram represents the ratio of the number of observations at each predicted probability to the total number of observations. The Brier scores were 0.083 for PRE, 0.080 for PERI, and 0.080 for POST. The Spiegelhalter’s z-score and p-value were − 0.083 and 0.934 for PRE, − 0.029 and 0.977 for PERI, and − 0.210 and 0.834 for POST. AUC, area under the ROC curve; PRE, premenopausal (age, 20–44 years); PERI, perimenopausal (age, 45–55 years); POST, postmenopausal (age, 56–80 years); ROC, receiver operating characteristic

References

    1. Cancer Information Service. Cancer Statistics. National Cancer Center, Japan. 2022. https://ganjoho.jp/reg_stat/statistics/data/dl/en.html Accessed 26 July 2024
    1. Cancer Information Service. Pref Cancer Screening Rate (2007–2022). Cancer Registry and Statistics. National Cancer Center, Japan. 2022. [in Japanese] https://ganjoho.jp/reg_stat/statistics/stat/screening/dl_screening.html#... Accessed 26 July 2024
    1. Willett WC, Tamimi R, Hankinson SE, et al. Nongenetic Factors in the Causation of Breast Cancer. Diseases of the Breast, 5th Edn. Chapter 18. Wolters Kluwer Health Adis (Esp); 2014
    1. Gail MH, Brinton LA, Byar DP, et al. Projecting individualized probabilities of developing breast cancer for white females who are being examined annually. J Natl Cancer Inst. 1989;81(24):1879–86. 10.1093/jnci/81.24.1879. - PubMed
    1. Tyrer J, Duffy SW, Cuzick J. A breast cancer prediction model incorporating familial and personal risk factors. Stat Med. 2004;23(7):1111–30. 10.1002/sim.1668. - PubMed

Supplementary concepts

LinkOut - more resources