Constructing multicancer risk cohorts using national data from medical helplines and secondary care
- PMID: 40866501
- PMCID: PMC12391443
- DOI: 10.1038/s41746-025-01855-0
Constructing multicancer risk cohorts using national data from medical helplines and secondary care
Abstract
Identification of cohorts at higher risk of cancer can enable earlier diagnosis of the disease, which significantly improves patient outcomes. In this study, we select nine cancer sites with high incidence of late-stage diagnosis or worsening survival rates, and where there are currently no national screening programmes. We use data from medical helplines (NHS 111) and secondary care appointments from all hospitals in England. We show that features based on information captured in NHS 111 calls are among the most influential in driving predictions of a future cancer diagnosis. Our predictive models exhibit good discrimination, ranging from 0.69 (ovarian cancer) to 0.83 (oesophageal cancer). We present an approach of constructing cohorts at higher risk of cancer based on feature importance and considering possible bias in model results. This approach is flexible and can be tailored based on data availability and the group the intervention targets (i.e. symptomatic or asymptomatic patients).
© 2025. The Author(s).
Conflict of interest statement
Competing interests: The authors declare no competing interests.
Figures
References
-
- Appelbaum, L. et al. Development and validation of a pancreatic cancer risk model for the general population using electronic health records: an observational study. Eur. J. Cancer143, 19–30 (2021). - PubMed
-
- Wang, Y. H., Nguyen, P. A., Mohaimenul Islam, M., Li, Y. C. & Yang, H. C. Development of deep learning algorithm for detection of colorectal cancer in EHR data. in Studies in Health Technology and Informatics Vol. 264, 438–441 (IOS Press, 2019). - PubMed
LinkOut - more resources
Full Text Sources
