Meta-Analysis

. 2023 May 10:381:e073800.

doi: 10.1136/bmj-2022-073800.

Development and internal-external validation of statistical and machine learning models for breast cancer prognostication: cohort study

Ash Kieran Clift^{1

2}, David Dodwell³, Simon Lord⁴, Stavros Petrou², Michael Brady⁴, Gary S Collins⁵, Julia Hippisley-Cox²

Affiliations

¹ Cancer Research UK Oxford Centre, Oxford, UK ashley.clift@phc.ox.ac.uk.
² Nuffield Department of Primary Care Health Sciences, Radcliffe Primary Care Building, Radcliffe Observatory Quarter, University of Oxford, Oxford OX2 6GG, UK.
³ Nuffield Department of Population Health, University of Oxford, Oxford, UK.
⁴ Department of Oncology, University of Oxford, Oxford, UK.
⁵ Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, UK.

PMID: 37164379
PMCID: PMC10170264
DOI: 10.1136/bmj-2022-073800

Meta-Analysis

Development and internal-external validation of statistical and machine learning models for breast cancer prognostication: cohort study

Ash Kieran Clift et al. BMJ. 2023.

. 2023 May 10:381:e073800.

doi: 10.1136/bmj-2022-073800.

Authors

Ash Kieran Clift^{1

2}, David Dodwell³, Simon Lord⁴, Stavros Petrou², Michael Brady⁴, Gary S Collins⁵, Julia Hippisley-Cox²

Affiliations

¹ Cancer Research UK Oxford Centre, Oxford, UK ashley.clift@phc.ox.ac.uk.
² Nuffield Department of Primary Care Health Sciences, Radcliffe Primary Care Building, Radcliffe Observatory Quarter, University of Oxford, Oxford OX2 6GG, UK.
³ Nuffield Department of Population Health, University of Oxford, Oxford, UK.
⁴ Department of Oncology, University of Oxford, Oxford, UK.
⁵ Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, UK.

PMID: 37164379
PMCID: PMC10170264
DOI: 10.1136/bmj-2022-073800

Abstract

Objective: To develop a clinically useful model that estimates the 10 year risk of breast cancer related mortality in women (self-reported female sex) with breast cancer of any stage, comparing results from regression and machine learning approaches.

Design: Population based cohort study.

Setting: QResearch primary care database in England, with individual level linkage to the national cancer registry, Hospital Episodes Statistics, and national mortality registers.

Participants: 141 765 women aged 20 years and older with a diagnosis of invasive breast cancer between 1 January 2000 and 31 December 2020.

Main outcome measures: Four model building strategies comprising two regression (Cox proportional hazards and competing risks regression) and two machine learning (XGBoost and an artificial neural network) approaches. Internal-external cross validation was used for model evaluation. Random effects meta-analysis that pooled estimates of discrimination and calibration metrics, calibration plots, and decision curve analysis were used to assess model performance, transportability, and clinical utility.

Results: During a median 4.16 years (interquartile range 1.76-8.26) of follow-up, 21 688 breast cancer related deaths and 11 454 deaths from other causes occurred. Restricting to 10 years maximum follow-up from breast cancer diagnosis, 20 367 breast cancer related deaths occurred during a total of 688 564.81 person years. The crude breast cancer mortality rate was 295.79 per 10 000 person years (95% confidence interval 291.75 to 299.88). Predictors varied for each regression model, but both Cox and competing risks models included age at diagnosis, body mass index, smoking status, route to diagnosis, hormone receptor status, cancer stage, and grade of breast cancer. The Cox model's random effects meta-analysis pooled estimate for Harrell's C index was the highest of any model at 0.858 (95% confidence interval 0.853 to 0.864, and 95% prediction interval 0.843 to 0.873). It appeared acceptably calibrated on calibration plots. The competing risks regression model had good discrimination: pooled Harrell's C index 0.849 (0.839 to 0.859, and 0.821 to 0.876, and evidence of systematic miscalibration on summary metrics was lacking. The machine learning models had acceptable discrimination overall (Harrell's C index: XGBoost 0.821 (0.813 to 0.828, and 0.805 to 0.837); neural network 0.847 (0.835 to 0.858, and 0.816 to 0.878)), but had more complex patterns of miscalibration and more variable regional and stage specific performance. Decision curve analysis suggested that the Cox and competing risks regression models tested may have higher clinical utility than the two machine learning approaches.

Conclusion: In women with breast cancer of any stage, using the predictors available in this dataset, regression based methods had better and more consistent performance compared with machine learning approaches and may be worthy of further evaluation for potential clinical use, such as for stratified follow-up.

PubMed Disclaimer

Conflict of interest statement

Competing interests: All authors have completed the ICMJE uniform disclosure form at www.icmje.org/disclosure-of-interest/ and declare: JHC is an unpaid director of QResearch (a not-for-profit organisation that is a partnership between the University of Oxford and EMIS Health that supply the QResearch database) and is a founder and shareholder of ClinRisk and was its medical director until 31 May 2019 (ClinRisk produces open and closed source software to implement clinical risk algorithms including two breast cancer risk models implemented in the National Health Service into clinical computer systems.

Figures

**Fig 1**
Summary of internal-external cross validation framework used to evaluate model performance for several metrics, and transportability

**Fig 2**
Final Cox proportional hazards model predicting 10 year risk of breast cancer mortality, presented as its exponentiated coefficients (hazard ratios with 95% confidence intervals). Model contains fractional polynomial terms for age (0.5, 2) and body mass index (2, 2), but these are not plotted owing to reasons of scale. Model also includes a baseline survival term (not plotted—the full model as coefficients is presented in the supplementary file). ACE=angiotensin converting enzyme; CI=confidence interval; CKD=chronic kidney disease; ER=oestrogen receptor; GP=general practitioner; HER2= human epidermal growth factor receptor 2; HRT=hormone replacement therapy; PR=progesterone receptor; RAA=renin-angiotensin aldosterone; SSRI=selective serotonin reuptake inhibitor

**Fig 3**
Results from internal-external cross validation of Cox proportional hazards model for Harrell’s C index. Plots display region level performance metric estimates and 95% confidence intervals (diamonds with lines), and an overall pooled estimate obtained using random effects meta-analysis and 95% confidence interval (lowest diamond) and 95% prediction interval (line through lowest diamond). CI=confidence interval

**Fig 4**
Results from internal-external cross validation of Cox proportional hazards model for calibration slope. Plots display region level performance metric estimates and 95% confidence intervals (diamonds with lines), and an overall pooled estimate obtained using random effects meta-analysis and 95% confidence interval (lowest diamond) and 95% prediction interval (line through lowest diamond). CI=confidence interval

**Fig 5**
Results from internal-external cross validation of Cox proportional hazards model for calibration-in-the-large. Plots display region level performance metric estimates and 95% confidence intervals (diamonds with lines), and an overall pooled estimate obtained using random effects meta-analysis and 95% confidence interval (lowest diamond) and 95% prediction interval (line through lowest diamond). CI=confidence interval

**Fig 6**
Calibration of the four models tested. Top row shows the alignment between predicted and observed risks for all models with smoothed calibration plots. Bottom row summarises the distribution of predicted risks from each model as histograms

**Fig 7**
Final competing risks regression model predicting 10 year risk of breast cancer mortality, presented as its exponentiated coefficients (subdistribution hazard ratios with 95% confidence intervals). Model contains fractional polynomial terms for age (1, 2) and body mass index (2, 2), but these are not plotted owing to reasons of scale. Model also includes an intercept term (not plotted—see supplementary file for full model as coefficients). CI=confidence interval; ER=oestrogen receptor; GP=general practitioner; HER2=human epidermal growth factor receptor 2; HRT=hormone replacement therapy; PR=progesterone receptor

**Fig 8**
Decision curves to assess clinical utility (net benefit) of using each model. Top plot accounts for the competing risk of other cause mortality. Bottom plot does not account for competing risks

See this image and copyright information in PMC

References

1. Wishart GC, Bajdik CD, Dicks E, et al. . PREDICT Plus: development and validation of a prognostic model for early breast cancer that includes HER2. Br J Cancer 2012;107:800-7. 10.1038/bjc.2012.338. - DOI - PMC - PubMed
1. Haybittle JL, Blamey RW, Elston CW, et al. . A prognostic index in primary breast cancer. Br J Cancer 1982;45:361-6. 10.1038/bjc.1982.62. - DOI - PMC - PubMed
1. Gray E, Donten A, Payne K, Hall PS. Survival estimates stratified by the Nottingham Prognostic Index for early breast cancer: a systematic review and meta-analysis of observational studies. Syst Rev 2018;7:142. 10.1186/s13643-018-0803-9. - DOI - PMC - PubMed
1. National Cancer Research Intitute. Publishing strategic priorities in breast cancer research 2022 (accessed 23 Jun 2022). https://www.ncri.org.uk/priorities-in-breast-cancer-research/
1. Cutillo CM, Sharma KR, Foschini L, Kundu S, Mackintosh M, Mandl KD, MI in Healthcare Workshop Working Group . Machine intelligence in healthcare-perspectives on trustworthiness, explainability, usability, and transparency. NPJ Digit Med 2020;3:47. 10.1038/s41746-020-0254-2. - DOI - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

27294/CRUK_/Cancer Research UK/United Kingdom

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Development and internal-external validation of statistical and machine learning models for breast cancer prognostication: cohort study

Affiliations

Development and internal-external validation of statistical and machine learning models for breast cancer prognostication: cohort study

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Medical