Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Nov 8;13(11):e077348.
doi: 10.1136/bmjopen-2023-077348.

Evaluating the performance of artificial intelligence software for lung nodule detection on chest radiographs in a retrospective real-world UK population

Affiliations

Evaluating the performance of artificial intelligence software for lung nodule detection on chest radiographs in a retrospective real-world UK population

Ahmed Maiter et al. BMJ Open. .

Abstract

Objectives: Early identification of lung cancer on chest radiographs improves patient outcomes. Artificial intelligence (AI) tools may increase diagnostic accuracy and streamline this pathway. This study evaluated the performance of commercially available AI-based software trained to identify cancerous lung nodules on chest radiographs.

Design: This retrospective study included primary care chest radiographs acquired in a UK centre. The software evaluated each radiograph independently and outputs were compared with two reference standards: (1) the radiologist report and (2) the diagnosis of cancer by multidisciplinary team decision. Failure analysis was performed by interrogating the software marker locations on radiographs.

Participants: 5722 consecutive chest radiographs were included from 5592 patients (median age 59 years, 53.8% women, 1.6% prevalence of cancer).

Results: Compared with radiologist reports for nodule detection, the software demonstrated sensitivity 54.5% (95% CI 44.2% to 64.4%), specificity 83.2% (82.2% to 84.1%), positive predictive value (PPV) 5.5% (4.6% to 6.6%) and negative predictive value (NPV) 99.0% (98.8% to 99.2%). Compared with cancer diagnosis, the software demonstrated sensitivity 60.9% (50.1% to 70.9%), specificity 83.3% (82.3% to 84.2%), PPV 5.6% (4.8% to 6.6%) and NPV 99.2% (99.0% to 99.4%). Normal or variant anatomy was misidentified as an abnormality in 69.9% of the 943 false positive cases.

Conclusions: The software demonstrated considerable underperformance in this real-world patient cohort. Failure analysis suggested a lack of generalisability in the training and testing datasets as a potential factor. The low PPV carries the risk of over-investigation and limits the translation of the software to clinical practice. Our findings highlight the importance of training and testing software in representative datasets, with broader implications for the implementation of AI tools in imaging.

Keywords: chest imaging; diagnostic imaging; diagnostic radiology.

PubMed Disclaimer

Conflict of interest statement

Competing interests: None declared.

Figures

Figure 1
Figure 1
Overview of study design. No radiographs or patients were excluded. MDT, multidisciplinary team.
Figure 2
Figure 2
(A) Number of patients by age group and sex. (B) Documented patient ethnicities.
Figure 3
Figure 3
Examples of true positive cancerous nodule identification by the software. The circles represent markers placed at the location of the detected abnormality by the artificial intelligence (AI) tool (note that these are not contours of the abnormality). (A) Primary right middle lobe lung cancer. (B) Primary left upper lobe lung cancer. (C) Right upper lobe lung metastasis.
Figure 4
Figure 4
Examples of false negative results. The white circles represent markers placed at the location of the detected abnormality by the artificial intelligence tool (note that these are not contours of the abnormality). The magenta circles have been added manually to indicate the location of the missed true abnormalities. (A) Missed left lower lobe cancer (magenta circle); the software has also misidentified the left first rib as a false positive abnormality. (B) Missed cancerous left hilar nodule; the software has misidentified the pacemaker as a false positive abnormality.
Figure 5
Figure 5
Examples of false positive abnormalities detected by the software. The circles represent markers placed at the location of the detected abnormality by the artificial intelligence tool (note that these are not contours of the abnormality). (A) End of the left first rib. (B) Composite left perihilar shadows. (C) Right nipple shadow. (D) Pacemaker. (E) Old left rib fracture. (F) Right breast implant.

References

    1. Yang C-FJ, Wang H, Kumar A, et al. . Impact of timing of lobectomy on survival for clinical stage IA lung squamous cell carcinoma. Chest 2017;152:1239–50. 10.1016/j.chest.2017.07.032 - DOI - PubMed
    1. Navani N, Nankivell M, Lawrence DR, et al. . Lung cancer diagnosis and staging with endobronchial ultrasound-guided transbronchial needle aspiration compared with conventional approaches: an open-label, pragmatic, randomised controlled trial. Lancet Respir Med 2015;3:282–9. 10.1016/S2213-2600(15)00029-6 - DOI - PMC - PubMed
    1. Cancer Research UK . Survival | lung cancer. Available: https://www.cancerresearchuk.org/about-cancer/lung-cancer/survival [Accessed 10 Dec 2022].
    1. Quadrelli S, Lyons G, Colt H, et al. . Clinical characteristics and prognosis of incidentally detected lung cancers. Int J Surg Oncol 2015;2015:287604. 10.1155/2015/287604 - DOI - PMC - PubMed
    1. National Institute for Health and Care Excellence (NICE) . Lung cancer: diagnosis and management. Available: https://www.nice.org.uk/guidance/ng122 [Accessed 10 Dec 2022]. - PubMed

Publication types