Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Observational Study
. 2025 May 2;75(754):e316-e322.
doi: 10.3399/BJGP.2023.0489. Print 2025 May.

Artificial intelligence for early detection of lung cancer in GPs' clinical notes: a retrospective observational cohort study

Affiliations
Observational Study

Artificial intelligence for early detection of lung cancer in GPs' clinical notes: a retrospective observational cohort study

Martijn C Schut et al. Br J Gen Pract. .

Abstract

Background: The journey of >80% of patients diagnosed with lung cancer starts in general practice. About 75% of patients are diagnosed when it is at an advanced stage (3 or 4), leading to >80% mortality within 1 year at present. The long-term data in GP records might contain hidden information that could be used for earlier case finding of patients with cancer.

Aim: To develop new prediction tools that improve the risk assessment for lung cancer.

Design and setting: Text analysis of electronic patient data using natural language processing and machine learning in the general practice files of four networks in the Netherlands.

Method: Files of 525 526 patients were analysed, of whom 2386 were diagnosed with lung cancer. Diagnoses were validated by using the Dutch cancer registry, and both structured and free-text data were used to predict the diagnosis of lung cancer 5 months before diagnosis (4 months before referral).

Results: The algorithm could facilitate earlier detection of lung cancer using routine general practice data. Discrimination, calibration, sensitivity, and specificity were established under various cut-off points of the prediction 5 months before diagnosis. Internal validation of the best model demonstrated an area under the curve of 0.88 (95% confidence interval [CI] = 0.86 to 0.89), which shrunk to 0.79 (95% CI = 0.78 to 0.80) during external validation. The desired sensitivity determines the number of patients to be referred to detect one patient with lung cancer.

Conclusion: Artificial intelligence-based support enables earlier detection of lung cancer in general practice using readily available text in the patient files of GPs, but needs additional prospective clinical evaluation.

Keywords: early detection; general practice; lung cancer; machine learning; natural language processing; oncology.

PubMed Disclaimer

Conflict of interest statement

The authors have declared no competing interests.

Figures

Figure 1.
Figure 1.
Precision-recall curves of the phrase skip-gram (PSG) model when used with text only (TO) and when used with text and coded data (TC). PPV = positive predictive value. PSGNN = phrase skip-gram neural network.
Figure 2.
Figure 2.
Calibration plot of the phrase skip-gram (PSG) model when used with text only (TO) and when used with text and coded data (TC). The plot shows actual probabilities (y-axis) versus predicted probabilities (x-axis), and an ideal curve (dotted line) is included for illustrating when the predicted probabilities are identical to the actual probabilities. PSGNN = phrase skip-gram neural network.

References

    1. Hanna TP, King WD, Thibodeau S, et al. Mortality due to cancer treatment delay: systematic review and meta-analysis. BMJ. 2020;371:m4087. - PMC - PubMed
    1. Rubin G, Berendsen A, Crawford SM, et al. The expanding role of primary care in cancer control. Lancet Oncol. 16(12):1231–1272. - PubMed
    1. Cancer Research UK Lung cancer survival statistics. https://www.cancerresearchuk.org/health-professional/cancer-statistics/s... (accessed 10 Apr 2025).
    1. Helsper CCW, van Erp NNF, Peeters PPHM, de Wit NNJ. Time to diagnosis and treatment for cancer patients in the Netherlands: room for improvement? Eur J Cancer. 2017;87:113–121. - PubMed
    1. Cancer Research UK Early cancer diagnosis data hub. https://crukcancerintelligence.shinyapps.io/EarlyDiagnosis (accessed 10 Apr 2025).

Publication types

LinkOut - more resources