Can incorrect artificial intelligence (AI) results impact radiologists, and if so, what can we do about it? A multi-reader pilot study of lung cancer detection with chest radiography
- PMID: 37266657
- PMCID: PMC10235827
- DOI: 10.1007/s00330-023-09747-1
Can incorrect artificial intelligence (AI) results impact radiologists, and if so, what can we do about it? A multi-reader pilot study of lung cancer detection with chest radiography
Abstract
Objective: To examine whether incorrect AI results impact radiologist performance, and if so, whether human factors can be optimized to reduce error.
Methods: Multi-reader design, 6 radiologists interpreted 90 identical chest radiographs (follow-up CT needed: yes/no) on four occasions (09/20-01/22). No AI result was provided for session 1. Sham AI results were provided for sessions 2-4, and AI for 12 cases were manipulated to be incorrect (8 false positives (FP), 4 false negatives (FN)) (0.87 ROC-AUC). In the Delete AI (No Box) condition, radiologists were told AI results would not be saved for the evaluation. In Keep AI (No Box) and Keep AI (Box), radiologists were told results would be saved. In Keep AI (Box), the ostensible AI program visually outlined the region of suspicion. AI results were constant between conditions.
Results: Relative to the No AI condition (FN = 2.7%, FP = 51.4%), FN and FPs were higher in the Keep AI (No Box) (FN = 33.0%, FP = 86.0%), Delete AI (No Box) (FN = 26.7%, FP = 80.5%), and Keep AI (Box) (FN = to 20.7%, FP = 80.5%) conditions (all ps < 0.05). FNs were higher in the Keep AI (No Box) condition (33.0%) than in the Keep AI (Box) condition (20.7%) (p = 0.04). FPs were higher in the Keep AI (No Box) (86.0%) condition than in the Delete AI (No Box) condition (80.5%) (p = 0.03).
Conclusion: Incorrect AI causes radiologists to make incorrect follow-up decisions when they were correct without AI. This effect is mitigated when radiologists believe AI will be deleted from the patient's file or a box is provided around the region of interest.
Clinical relevance statement: When AI is wrong, radiologists make more errors than they would have without AI. Based on human factors psychology, our manuscript provides evidence for two AI implementation strategies that reduce the deleterious effects of incorrect AI.
Key points: • When AI provided incorrect results, false negative and false positive rates among the radiologists increased. • False positives decreased when AI results were deleted, versus kept, in the patient's record. • False negatives and false positives decreased when AI visually outlined the region of suspicion.
Keywords: Artificial intelligence; Cognitive science; Psychology.
© 2023. The Author(s).
Conflict of interest statement
The authors of this manuscript declare relationships with the following companies: Lunit. Authors MHB and GLB are forming a relationship with the AI company, Lunit. The study published here began prior to our work with Lunit. Lunit had no involvement in any aspect of this study.
Figures




Similar articles
-
AI-based improvement in lung cancer detection on chest radiographs: results of a multi-reader study in NLST dataset.Eur Radiol. 2021 Dec;31(12):9664-9674. doi: 10.1007/s00330-021-08074-7. Epub 2021 Jun 4. Eur Radiol. 2021. PMID: 34089072
-
AI for fracture diagnosis in clinical practice: Four approaches to systematic AI-implementation and their impact on AI-effectiveness.Eur J Radiol. 2025 Jun;187:112113. doi: 10.1016/j.ejrad.2025.112113. Epub 2025 Apr 14. Eur J Radiol. 2025. PMID: 40252277
-
Artificial intelligence system for identification of false-negative interpretations in chest radiographs.Eur Radiol. 2022 Jul;32(7):4468-4478. doi: 10.1007/s00330-022-08593-x. Epub 2022 Feb 23. Eur Radiol. 2022. PMID: 35195744
-
Use of artificial intelligence for image analysis in breast cancer screening programmes: systematic review of test accuracy.BMJ. 2021 Sep 1;374:n1872. doi: 10.1136/bmj.n1872. BMJ. 2021. PMID: 34470740 Free PMC article.
-
AI applications in musculoskeletal imaging: a narrative review.Eur Radiol Exp. 2024 Feb 15;8(1):22. doi: 10.1186/s41747-024-00422-8. Eur Radiol Exp. 2024. PMID: 38355767 Free PMC article. Review.
Cited by
-
False Positives in Artificial Intelligence Prioritization Software for Intracranial Hemorrhage Identification in the Postoperative Period: A Report of Two Cases.Cureus. 2023 Aug 27;15(8):e44215. doi: 10.7759/cureus.44215. eCollection 2023 Aug. Cureus. 2023. PMID: 37641727 Free PMC article.
-
Radiological data processing system: lifecycle management and annotation.Int J Comput Assist Radiol Surg. 2025 Jun 20. doi: 10.1007/s11548-025-03430-0. Online ahead of print. Int J Comput Assist Radiol Surg. 2025. PMID: 40540198
-
Subgroup evaluation to understand performance gaps in deep learning-based classification of regions of interest on mammography.PLOS Digit Health. 2025 Apr 8;4(4):e0000811. doi: 10.1371/journal.pdig.0000811. eCollection 2025 Apr. PLOS Digit Health. 2025. PMID: 40198652 Free PMC article.
-
Clinically Meaningful AI Detection of Interval Breast Cancer at Digital Breast Tomosynthesis Screening.Radiology. 2025 Jul;316(1):e251860. doi: 10.1148/radiol.251860. Radiology. 2025. PMID: 40728395 No abstract available.
-
A Thorough Review of the Clinical Applications of Artificial Intelligence in Lung Cancer.Cancers (Basel). 2025 Mar 4;17(5):882. doi: 10.3390/cancers17050882. Cancers (Basel). 2025. PMID: 40075729 Free PMC article. Review.
References
-
- American College of Radiology (2022). Available via https://aicentral.acrdsi.org/. Accessed 12 Oct 2022
MeSH terms
LinkOut - more resources
Full Text Sources
Medical
Miscellaneous