Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Dec 21:379:e072826.
doi: 10.1136/bmj-2022-072826.

Can artificial intelligence pass the Fellowship of the Royal College of Radiologists examination? Multi-reader diagnostic accuracy study

Collaborators, Affiliations

Can artificial intelligence pass the Fellowship of the Royal College of Radiologists examination? Multi-reader diagnostic accuracy study

Susan Cheng Shelmerdine et al. BMJ. .

Abstract

Objective: To determine whether an artificial intelligence candidate could pass the rapid (radiographic) reporting component of the Fellowship of the Royal College of Radiologists (FRCR) examination.

Design: Prospective multi-reader diagnostic accuracy study.

Setting: United Kingdom.

Participants: One artificial intelligence candidate (Smarturgences, Milvue) and 26 radiologists who had passed the FRCR examination in the preceding 12 months.

Main outcome measures: Accuracy and pass rate of the artificial intelligence compared with radiologists across 10 mock FRCR rapid reporting examinations (each examination containing 30 radiographs, requiring 90% accuracy rate to pass).

Results: When non-interpretable images were excluded from the analysis, the artificial intelligence candidate achieved an average overall accuracy of 79.5% (95% confidence interval 74.1% to 84.3%) and passed two of 10 mock FRCR examinations. The average radiologist achieved an average accuracy of 84.8% (76.1-91.9%) and passed four of 10 mock examinations. The sensitivity for the artificial intelligence was 83.6% (95% confidence interval 76.2% to 89.4%) and the specificity was 75.2% (66.7% to 82.5%), compared with summary estimates across all radiologists of 84.1% (81.0% to 87.0%) and 87.3% (85.0% to 89.3%). Across 148/300 radiographs that were correctly interpreted by >90% of radiologists, the artificial intelligence candidate was incorrect in 14/148 (9%). In 20/300 radiographs that most (>50%) radiologists interpreted incorrectly, the artificial intelligence candidate was correct in 10/20 (50%). Most imaging pitfalls related to interpretation of musculoskeletal rather than chest radiographs.

Conclusions: When special dispensation for the artificial intelligence candidate was provided (that is, exclusion of non-interpretable images), the artificial intelligence candidate was able to pass two of 10 mock examinations. Potential exists for the artificial intelligence candidate to improve its radiographic interpretation skills by focusing on musculoskeletal cases and learning to interpret radiographs of the axial skeleton and abdomen that are currently considered "non-interpretable."

PubMed Disclaimer

Conflict of interest statement

Competing interests: All authors have completed the ICJME uniform disclosure form at https://www.icmje.org/disclosure-of-interest/ and declare: support from the National Institute for Health Research; no financial relationships with any organisations that might have an interest in the submitted work in the previous three years; SS is the organiser of a radiology revision course mentioned in this study and helped to recruit radiologist readers to the project, but this relationship had no influence on the reported results of the work and no financial incentive was provided; no other relationships or activities that could appear to have influenced the submitted work.

Figures

Fig 1
Fig 1
Bar charts showing examination percentage scores per Fellowship of the Royal College of Radiologists mock examination, and overall, acquired by artificial intelligence (AI) candidate and radiologist participants in scenario 1 for only “AI interpretable” images (top) and scenario 4 for all images (bottom). Whisker plots denote standard deviation of scores around mean value by all 26 radiologist participants
Fig 2
Fig 2
Plot of individual sensitivity and false positive rates of 26 radiologists and artificial intelligence (AI), based on scenario 1 (only “AI interpretable” images) considered. Bivariate random effects summary receiver operator characteristic (SROC) curve and summary estimate for radiologists are included for comparison with AI candidate
Fig 3
Fig 3
Normal paediatric abdominal radiograph interpreted by artificial intelligence (AI) candidate as having right basal pneumothorax with dashed bounding box (false positive result). This should have been identified as non-interpretable by AI. French translation: positif=positive; doute=doubt; epanchement pleural=pleural effusion; luxation=dislocation; negatif=negative; nodule pulmonaire=pulmonary nodule; opacite pulmonaire=pulmonary opacification
Fig 4
Fig 4
Dorsoplantar and oblique views of abnormal right foot radiograph in adult showing osteochondral defect at talar dome (white dashed arrow). This finding was missed by all 26 radiologists and also artificial intelligence candidate (false negative) and was particularly challenging. French translation: doute=doubt; luxation=dislocation; negatif=negative; positif=positive
Fig 5
Fig 5
Dorsoplantar and oblique views of abnormal right foot radiograph in adult showing acute fracture of proximal phalanx of big toe, correctly interpreted by less than half of radiologists (46%) and correctly identified by artificial intelligence candidate (dashed bounding box). French translation: doute=doubt; luxation=dislocation; negatif=negative; positif=positive
Fig 6
Fig 6
Abnormal adult pelvic radiograph showing increased sclerosis and expansion of right iliac bone in keeping with Paget’s disease. This was correctly identified by almost all radiologists (96%) but interpreted as normal by artificial intelligence candidate (false negative), given that this was not a pathology it was trained to identify. French translation: doute=doubt; luxation=dislocation; negatif=negative; positif=positive
Fig 7
Fig 7
Normal lateral scapular Y view of right shoulder in child, incorrectly interpreted by artificial intelligence candidate as having proximal humeral fracture (dashed bounding box). This was false positive result, which was correctly identified as normal by all 26 radiologists. French translation: doute=doubt; epanchement articulaire=joint effusion; negatif=negative; positif=positive

Comment in

References

    1. Mukherjee S. A.I. versus M.D: What happens when diagnosis is automated? 2017. https://www.newyorker.com/magazine/2017/04/03/ai-versus-md.
    1. Booth TC, Martins RDM, McKnight L, Courtney K, Malliwal R. The Fellowship of the Royal College of Radiologists (FRCR) examination: a review of the evidence. Clin Radiol 2018;73:992-8. 10.1016/j.crad.2018.09.005 - DOI - PMC - PubMed
    1. Kassamali RH, Hoey ET. Radiology training in United Kingdom: current status. Quant Imaging Med Surg 2014;4:447-8. - PMC - PubMed
    1. The Royal College of Radiologists. Final Examination for the Fellowship in Clinical Radiology (Part B) Scoring System. https://www.rcr.ac.uk/sites/default/files/docs/radiology/pdf/CR2B_scorin....
    1. The Royal College of Radiologists. FRCR Part 2B (Radiology) - CR2B Examination Information. 2022. https://www.rcr.ac.uk/clinical-radiology/examinations/frcr-part-2b-radio....