Can we screen for pancreatic cancer? Identifying a sub-population of patients at high risk of subsequent diagnosis using machine learning techniques applied to primary care data
- PMID: 34077433
- PMCID: PMC8171946
- DOI: 10.1371/journal.pone.0251876
Can we screen for pancreatic cancer? Identifying a sub-population of patients at high risk of subsequent diagnosis using machine learning techniques applied to primary care data
Abstract
Background: Pancreatic cancer (PC) represents a substantial public health burden. Pancreatic cancer patients have very low survival due to the difficulty of identifying cancers early when the tumour is localised to the site of origin and treatable. Recent progress has been made in identifying biomarkers for PC in the blood and urine, but these cannot be used for population-based screening as this would be prohibitively expensive and potentially harmful.
Methods: We conducted a case-control study using prospectively-collected electronic health records from primary care individually-linked to cancer registrations. Our cases were comprised of 1,139 patients, aged 15-99 years, diagnosed with pancreatic cancer between January 1, 2005 and June 30, 2009. Each case was age-, sex- and diagnosis time-matched to four non-pancreatic (cancer patient) controls. Disease and prescription codes for the 24 months prior to diagnosis were used to identify 57 individual symptoms. Using a machine learning approach, we trained a logistic regression model on 75% of the data to predict patients who later developed PC and tested the model's performance on the remaining 25%.
Results: We were able to identify 41.3% of patients < = 60 years at 'high risk' of developing pancreatic cancer up to 20 months prior to diagnosis with 72.5% sensitivity, 59% specificity and, 66% AUC. 43.2% of patients >60 years were similarly identified at 17 months, with 65% sensitivity, 57% specificity and, 61% AUC. We estimate that combining our algorithm with currently available biomarker tests could result in 30 older and 400 younger patients per cancer being identified as 'potential patients', and the earlier diagnosis of around 60% of tumours.
Conclusion: After further work this approach could be applied in the primary care setting and has the potential to be used alongside a non-invasive biomarker test to increase earlier diagnosis. This would result in a greater number of patients surviving this devastating disease.
Conflict of interest statement
The authors declare no competing interests.
Figures



Similar articles
-
Identification of patients at risk for pancreatic cancer in a 3-year timeframe based on machine learning algorithms.Sci Rep. 2025 Apr 5;15(1):11697. doi: 10.1038/s41598-025-89607-8. Sci Rep. 2025. PMID: 40188106 Free PMC article.
-
Predicting Pancreatic Cancer in New-Onset Diabetes Cohort Using a Novel Model With Integrated Clinical and Genetic Indicators: A Large-Scale Prospective Cohort Study.Cancer Med. 2024 Nov;13(21):e70388. doi: 10.1002/cam4.70388. Cancer Med. 2024. PMID: 39526476 Free PMC article.
-
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217. Cochrane Database Syst Rev. 2022. PMID: 36321557 Free PMC article.
-
Pancreatic cancer symptom trajectories from Danish registry data and free text in electronic health records.Elife. 2023 Nov 21;12:e84919. doi: 10.7554/eLife.84919. Elife. 2023. PMID: 37988407 Free PMC article.
-
Multi-cancer early detection tests for general population screening: a systematic literature review.Health Technol Assess. 2025 Jan;29(2):1-105. doi: 10.3310/DLMT1294. Health Technol Assess. 2025. PMID: 39898371 Free PMC article.
Cited by
-
An Integrative Pancreatic Cancer Risk Prediction Model in the UK Biobank.Biomedicines. 2023 Dec 1;11(12):3206. doi: 10.3390/biomedicines11123206. Biomedicines. 2023. PMID: 38137427 Free PMC article.
-
Application of artificial intelligence to pancreatic adenocarcinoma.Front Oncol. 2022 Jul 22;12:960056. doi: 10.3389/fonc.2022.960056. eCollection 2022. Front Oncol. 2022. PMID: 35936738 Free PMC article. Review.
-
Identification of patients at risk for pancreatic cancer in a 3-year timeframe based on machine learning algorithms.Sci Rep. 2025 Apr 5;15(1):11697. doi: 10.1038/s41598-025-89607-8. Sci Rep. 2025. PMID: 40188106 Free PMC article.
-
Constructing multicancer risk cohorts using national data from medical helplines and secondary care.NPJ Digit Med. 2025 Aug 27;8(1):551. doi: 10.1038/s41746-025-01855-0. NPJ Digit Med. 2025. PMID: 40866501 Free PMC article.
-
A deep learning algorithm to predict risk of pancreatic cancer from disease trajectories.Nat Med. 2023 May;29(5):1113-1122. doi: 10.1038/s41591-023-02332-5. Epub 2023 May 8. Nat Med. 2023. PMID: 37156936 Free PMC article.
References
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Medical