Artificial intelligence for early detection of lung cancer in GPs' clinical notes: a retrospective observational cohort study
- PMID: 40044183
- PMCID: PMC12040367
- DOI: 10.3399/BJGP.2023.0489
Artificial intelligence for early detection of lung cancer in GPs' clinical notes: a retrospective observational cohort study
Abstract
Background: The journey of >80% of patients diagnosed with lung cancer starts in general practice. About 75% of patients are diagnosed when it is at an advanced stage (3 or 4), leading to >80% mortality within 1 year at present. The long-term data in GP records might contain hidden information that could be used for earlier case finding of patients with cancer.
Aim: To develop new prediction tools that improve the risk assessment for lung cancer.
Design and setting: Text analysis of electronic patient data using natural language processing and machine learning in the general practice files of four networks in the Netherlands.
Method: Files of 525 526 patients were analysed, of whom 2386 were diagnosed with lung cancer. Diagnoses were validated by using the Dutch cancer registry, and both structured and free-text data were used to predict the diagnosis of lung cancer 5 months before diagnosis (4 months before referral).
Results: The algorithm could facilitate earlier detection of lung cancer using routine general practice data. Discrimination, calibration, sensitivity, and specificity were established under various cut-off points of the prediction 5 months before diagnosis. Internal validation of the best model demonstrated an area under the curve of 0.88 (95% confidence interval [CI] = 0.86 to 0.89), which shrunk to 0.79 (95% CI = 0.78 to 0.80) during external validation. The desired sensitivity determines the number of patients to be referred to detect one patient with lung cancer.
Conclusion: Artificial intelligence-based support enables earlier detection of lung cancer in general practice using readily available text in the patient files of GPs, but needs additional prospective clinical evaluation.
Keywords: early detection; general practice; lung cancer; machine learning; natural language processing; oncology.
© The Authors.
Conflict of interest statement
The authors have declared no competing interests.
Figures
References
-
- Rubin G, Berendsen A, Crawford SM, et al. The expanding role of primary care in cancer control. Lancet Oncol. 16(12):1231–1272. - PubMed
-
- Cancer Research UK Lung cancer survival statistics. https://www.cancerresearchuk.org/health-professional/cancer-statistics/s... (accessed 10 Apr 2025).
-
- Helsper CCW, van Erp NNF, Peeters PPHM, de Wit NNJ. Time to diagnosis and treatment for cancer patients in the Netherlands: room for improvement? Eur J Cancer. 2017;87:113–121. - PubMed
-
- Cancer Research UK Early cancer diagnosis data hub. https://crukcancerintelligence.shinyapps.io/EarlyDiagnosis (accessed 10 Apr 2025).
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Medical