Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Nov;33(11):8263-8269.
doi: 10.1007/s00330-023-09747-1. Epub 2023 Jun 2.

Can incorrect artificial intelligence (AI) results impact radiologists, and if so, what can we do about it? A multi-reader pilot study of lung cancer detection with chest radiography

Affiliations

Can incorrect artificial intelligence (AI) results impact radiologists, and if so, what can we do about it? A multi-reader pilot study of lung cancer detection with chest radiography

Michael H Bernstein et al. Eur Radiol. 2023 Nov.

Abstract

Objective: To examine whether incorrect AI results impact radiologist performance, and if so, whether human factors can be optimized to reduce error.

Methods: Multi-reader design, 6 radiologists interpreted 90 identical chest radiographs (follow-up CT needed: yes/no) on four occasions (09/20-01/22). No AI result was provided for session 1. Sham AI results were provided for sessions 2-4, and AI for 12 cases were manipulated to be incorrect (8 false positives (FP), 4 false negatives (FN)) (0.87 ROC-AUC). In the Delete AI (No Box) condition, radiologists were told AI results would not be saved for the evaluation. In Keep AI (No Box) and Keep AI (Box), radiologists were told results would be saved. In Keep AI (Box), the ostensible AI program visually outlined the region of suspicion. AI results were constant between conditions.

Results: Relative to the No AI condition (FN = 2.7%, FP = 51.4%), FN and FPs were higher in the Keep AI (No Box) (FN = 33.0%, FP = 86.0%), Delete AI (No Box) (FN = 26.7%, FP = 80.5%), and Keep AI (Box) (FN = to 20.7%, FP = 80.5%) conditions (all ps < 0.05). FNs were higher in the Keep AI (No Box) condition (33.0%) than in the Keep AI (Box) condition (20.7%) (p = 0.04). FPs were higher in the Keep AI (No Box) (86.0%) condition than in the Delete AI (No Box) condition (80.5%) (p = 0.03).

Conclusion: Incorrect AI causes radiologists to make incorrect follow-up decisions when they were correct without AI. This effect is mitigated when radiologists believe AI will be deleted from the patient's file or a box is provided around the region of interest.

Clinical relevance statement: When AI is wrong, radiologists make more errors than they would have without AI. Based on human factors psychology, our manuscript provides evidence for two AI implementation strategies that reduce the deleterious effects of incorrect AI.

Key points: • When AI provided incorrect results, false negative and false positive rates among the radiologists increased. • False positives decreased when AI results were deleted, versus kept, in the patient's record. • False negatives and false positives decreased when AI visually outlined the region of suspicion.

Keywords: Artificial intelligence; Cognitive science; Psychology.

PubMed Disclaimer

Conflict of interest statement

The authors of this manuscript declare relationships with the following companies: Lunit. Authors MHB and GLB are forming a relationship with the AI company, Lunit. The study published here began prior to our work with Lunit. Lunit had no involvement in any aspect of this study.

Figures

Fig. 1
Fig. 1
Study flow chart. Overall experimental procedure is displayed. The order of Keep AI (No Box) and Delete AI (No Box) was counterbalanced such that n = 3 participated in Keep AI (No Box) first, and n = 2 participated in Delete AI (No Box) first
Fig. 2
Fig. 2
Procedural overview. A brief description of the major procedural elements within each session is shown
Fig. 3
Fig. 3
False negatives (incorrect AI feedback) by experimental condition. False negative percent (y-axis) is shown for the No AI (red), Keep AI (No Box) (brown), Delete AI (No Box) (green), and Keep AI (Box) (blue) conditions (x-axis). Mean (circle) and 95% confidence intervals are displayed. Results display the four conditions for the 4 cases where AI provided false negative feedback
Fig. 4
Fig. 4
False positives (incorrect AI feedback) by experimental condition. False positive percent (y-axis) is shown for the No AI (red), Keep AI (No Box) (brown), Delete AI (No Box) (green), and Keep AI (Box) (blue) conditions (x-axis). Mean (circle) and 95% confidence intervals are displayed. Results display the four conditions for the 8 cases where AI provided false positive feedback

Similar articles

Cited by

References

    1. American College of Radiology (2022). Available via https://aicentral.acrdsi.org/. Accessed 12 Oct 2022
    1. Allen B, Agarwal S, Coombs L, Wald C, Dreyer K. 2020 ACR Data Science Institute artificial intelligence survey. J Am Coll Radiol. 2021;18:1153–1159. doi: 10.1016/j.jacr.2021.04.002. - DOI - PubMed
    1. Aggarwal R, Sounderajah V, Martin G et al (2021) Diagnostic accuracy of deep learning in medical imaging: a systematic review and meta-analysis. NPJ Digit Med 4:65 - PMC - PubMed
    1. Seah JC, Tang CH, Buchlak QD, et al. Effect of a comprehensive deep-learning model on the accuracy of chest x-ray interpretation by radiologists: a retrospective, multireader multicase study. Lancet Digit Health. 2021;3:e496–e506. doi: 10.1016/S2589-7500(21)00106-0. - DOI - PubMed
    1. Homayounieh F, Digumarthy S, Ebrahimian S, et al. An artificial intelligence–based chest X-ray model on human nodule detection accuracy from a multicenter study. JAMA Netw Open. 2021;4:e2141096–e2141096. doi: 10.1001/jamanetworkopen.2021.41096. - DOI - PMC - PubMed