Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Meta-Analysis
. 2022 Jul;304(1):50-62.
doi: 10.1148/radiol.211785. Epub 2022 Mar 29.

Artificial Intelligence in Fracture Detection: A Systematic Review and Meta-Analysis

Affiliations
Meta-Analysis

Artificial Intelligence in Fracture Detection: A Systematic Review and Meta-Analysis

Rachel Y L Kuo et al. Radiology. 2022 Jul.

Abstract

Background Patients with fractures are a common emergency presentation and may be misdiagnosed at radiologic imaging. An increasing number of studies apply artificial intelligence (AI) techniques to fracture detection as an adjunct to clinician diagnosis. Purpose To perform a systematic review and meta-analysis comparing the diagnostic performance in fracture detection between AI and clinicians in peer-reviewed publications and the gray literature (ie, articles published on preprint repositories). Materials and Methods A search of multiple electronic databases between January 2018 and July 2020 (updated June 2021) was performed that included any primary research studies that developed and/or validated AI for the purposes of fracture detection at any imaging modality and excluded studies that evaluated image segmentation algorithms. Meta-analysis with a hierarchical model to calculate pooled sensitivity and specificity was used. Risk of bias was assessed by using a modified Prediction Model Study Risk of Bias Assessment Tool, or PROBAST, checklist. Results Included for analysis were 42 studies, with 115 contingency tables extracted from 32 studies (55 061 images). Thirty-seven studies identified fractures on radiographs and five studies identified fractures on CT images. For internal validation test sets, the pooled sensitivity was 92% (95% CI: 88, 93) for AI and 91% (95% CI: 85, 95) for clinicians, and the pooled specificity was 91% (95% CI: 88, 93) for AI and 92% (95% CI: 89, 92) for clinicians. For external validation test sets, the pooled sensitivity was 91% (95% CI: 84, 95) for AI and 94% (95% CI: 90, 96) for clinicians, and the pooled specificity was 91% (95% CI: 81, 95) for AI and 94% (95% CI: 91, 95) for clinicians. There were no statistically significant differences between clinician and AI performance. There were 22 of 42 (52%) studies that were judged to have high risk of bias. Meta-regression identified multiple sources of heterogeneity in the data, including risk of bias and fracture type. Conclusion Artificial intelligence (AI) and clinicians had comparable reported diagnostic performance in fracture detection, suggesting that AI technology holds promise as a diagnostic adjunct in future clinical practice. Clinical trial registration no. CRD42020186641 © RSNA, 2022 Online supplemental material is available for this article. See also the editorial by Cohen and McInnes in this issue.

PubMed Disclaimer

Conflict of interest statement

Disclosures of conflicts of interest: R.Y.L.K. No relevant relationships. C.H. No relevant relationships. T.A.C. No relevant relationships. B.J. No relevant relationships. A.F. No relevant relationships. D.C. No relevant relationships. M.S. No relevant relationships. G.S.C. No relevant relationships. D.F. Chair, British Society for Surgery of the Hand Research Committee; member, British Association of Plastic, Reconstructive, and Aesthetic Surgeons Research Committee; member, British Lymphology Society Research Committee; chair, Scientific Advisory Committee Restore Research; Trustee, British Dupuytren Society.

Figures

None
Graphical abstract
Preferred Reporting Items for Systematic Reviews and Meta-Analyses
flowchart shows studies selected for review. ACM = Association for
Computing Machinery, AI = artificial intelligence, CENTRAL =
Central Register of Controlled Trials, CINAHL = Cumulative Index to
Nursing and Allied Health Literature, IEEE = Institute of Electrical
and Electronics Engineers and Institution of Engineering and
Technology.
Figure 1:
Preferred Reporting Items for Systematic Reviews and Meta-Analyses flowchart shows studies selected for review. ACM = Association for Computing Machinery, AI = artificial intelligence, CENTRAL = Central Register of Controlled Trials, CINAHL = Cumulative Index to Nursing and Allied Health Literature, IEEE = Institute of Electrical and Electronics Engineers and Institution of Engineering and Technology.
Summary of study adherence to Transparent Reporting of a Multivariable
Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) reporting
guidelines.
Figure 2:
Summary of study adherence to Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) reporting guidelines.
Summary of Prediction Model Study Risk of Bias Assessment Tool
(PROBAST) risk of bias and concern about generalizability scores.
Figure 3:
Summary of Prediction Model Study Risk of Bias Assessment Tool (PROBAST) risk of bias and concern about generalizability scores.
Hierarchical summary receiver operating characteristic (HSROC) curves
for (A) fracture detection algorithms and (B) clinicians with internal
validation test sets. The 95% prediction region is a visual representation
of between-study heterogeneity.
Figure 4:
Hierarchical summary receiver operating characteristic (HSROC) curves for (A) fracture detection algorithms and (B) clinicians with internal validation test sets. The 95% prediction region is a visual representation of between-study heterogeneity.
Hierarchical summary receiver operating characteristic (HSROC) curves
for (A) fracture detection algorithms and (B) clinicians with external
validation test sets. The 95% prediction region is a visual representation
of between-study heterogeneity.
Figure 5:
Hierarchical summary receiver operating characteristic (HSROC) curves for (A) fracture detection algorithms and (B) clinicians with external validation test sets. The 95% prediction region is a visual representation of between-study heterogeneity.
Summary of pooled sensitivity, specificity, and area under the curve
(AUC) of algorithms and clinicians comparing all studies versus low-bias
studies with 95% CIs.
Figure 6:
Summary of pooled sensitivity, specificity, and area under the curve (AUC) of algorithms and clinicians comparing all studies versus low-bias studies with 95% CIs.

Comment in

References

    1. Bergh C , Wennergren D , Möller M , Brisby H . Fracture incidence in adults in relation to age and gender: A study of 27,169 fractures in the Swedish Fracture Register in a well-defined catchment area . PLoS One 2020. ; 15 ( 12 ): e0244291 . - PMC - PubMed
    1. Amin S , Achenbach SJ , Atkinson EJ , Khosla S , Melton LJ 3rd . Trends in fracture incidence: a population-based study over 20 years . J Bone Miner Res 2014. ; 29 ( 3 ): 581 – 589 . - PMC - PubMed
    1. Curtis EM , van der Velde R , Moon RJ , et al. . Epidemiology of fractures in the United Kingdom 1988-2012: Variation with age, sex, geography, ethnicity and socioeconomic status . Bone 2016. ; 87 : 19 – 26 . - PMC - PubMed
    1. UK NHS Annual Report . Hospital accident & emergency activity 2019-20 . https://digital.nhs.uk/data-and-information/publications/statistical/hos.... Accessed December 21, 2021 .
    1. Wei CJ , Tsai WC , Tiu CM , Wu HT , Chiou HJ , Chang CY . Systematic analysis of missed extremity fractures in emergency radiology . Acta Radiol 2006. ; 47 ( 7 ): 710 – 717 . - PubMed

Publication types