Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Meta-Analysis
. 2024 Oct 4;14(1):23053.
doi: 10.1038/s41598-024-73058-8.

Artificial intelligence in commercial fracture detection products: a systematic review and meta-analysis of diagnostic test accuracy

Affiliations
Meta-Analysis

Artificial intelligence in commercial fracture detection products: a systematic review and meta-analysis of diagnostic test accuracy

Julius Husarek et al. Sci Rep. .

Abstract

Conventional radiography (CR) is primarily utilized for fracture diagnosis. Artificial intelligence (AI) for CR is a rapidly growing field aimed at enhancing efficiency and increasing diagnostic accuracy. However, the diagnostic performance of commercially available AI fracture detection solutions (CAAI-FDS) for CR in various anatomical regions, their synergy with human assessment, as well as the influence of industry funding on reported accuracy are unknown. Peer-reviewed diagnostic test accuracy (DTA) studies were identified through a systematic review on Pubmed and Embase. Diagnostic performance measures were extracted especially for different subgroups such as product, type of rater (stand-alone AI, human unaided, human aided), funding, and anatomical region. Pooled measures were obtained with a bivariate random effects model. The impact of rater was evaluated with comparative meta-analysis. Seventeen DTA studies of seven CAAI-FDS analyzing 38,978 x-rays with 8,150 fractures were included. Stand-alone AI studies (n = 15) evaluated five CAAI-FDS; four with good sensitivities (> 90%) and moderate specificities (80-90%) and one with very poor sensitivity (< 60%) and excellent specificity (> 95%). Pooled sensitivities were good to excellent, and specificities were moderate to good in all anatomical regions (n = 7) apart from ribs (n = 4; poor sensitivity / moderate specificity) and spine (n = 4; excellent sensitivity / poor specificity). Funded studies (n = 4) had higher sensitivity (+ 5%) and lower specificity (-4%) than non-funded studies (n = 11). Sensitivity did not differ significantly between stand-alone AI and human AI aided ratings (p = 0.316) but specificity was significantly higher the latter group (p < 0.001). Sensitivity was significant lower in human unaided compared to human AI aided respectively stand-alone AI ratings (both p ≤ 0.001); specificity was higher in human unaided ratings compared to stand-alone AI (p < 0.001) and showed no significant differences AI aided ratings (p = 0.316). The study demonstrates good diagnostic accuracy across most CAAI-FDS and anatomical regions, with the highest performance achieved when used in conjunction with human assessment. Diagnostic accuracy appears lower for spine and rib fractures. The impact of industry funding on reported performance is small.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
PRISMA flowchart.
Fig. 2
Fig. 2
Diagnostic accuracy with 95% confidence interval (CI) according to stand-alone AI, human unaided and aided rater (total). Generalized I values: Artificial intelligence 0.86; Human aided 0.87; Human unaided 0.94; Overall 0.79.
Fig. 3
Fig. 3
Diagnostic accuracy with 95% confidence interval (CI) according to AI fracture detection product. Generalized I values: BoneView 0.82; Enterprise CXR TT -; FractureDetect -; Rayvolve < 0.01; SmartUrgence -; Overall 0.85.
Fig. 4
Fig. 4
Diagnostic accuracy with 95% confidence interval (CI) for stand-alone AI according to body region. Generalized I values: Ankle/Foot < 0.01; Elbow/Arm 0.01; Hand/Wrist < 0.01; Knee/Leg 0.02; Pelvis/Hip < 0.01; Ribs < 0.01; Shoulder/Clavicle < 0.01; Spine < 0.01; Overall 0.18.
Fig. 5
Fig. 5
Sensitivity and specificity with 95% confidence interval (CI) for stand-alone AI according to reference standard. Generalized I values: Expert consensus 0.73; Others 0.85; Overall 0.86.
Fig. 6
Fig. 6
Sensitivity and specificity with 95% confidence interval (CI) for stand-alone AI according to funding status. Generalized I values: Industry funding 0.80; Other/no funding 0.86; Overall 0.86.
Fig. 7
Fig. 7
Sensitivity and specificity with 95% confidence interval (CI) for stand-alone AI according to different RoB category. Generalized I values: Low 0.87; Moderate 0.87; High -; Overall 0.86.
Fig. 8
Fig. 8
Diagnostic accuracy with 95% confidence interval (CI) depending on the type of rater (stand-alone AI and human aided/unaided).

References

    1. Wu, A. M. et al. Global, regional, and national burden of bone fractures in 204 countries and territories, 1990–2019: A systematic analysis from the global burden of disease study 2019. Lancet Healthy Longev.2, e580–e592 (2021). - PMC - PubMed
    1. Bergh, C., Wennergren, D., Möller, M. & Brisby, H. Fracture incidence in adults in relation to age and gender: A study of 27,169 fractures in the Swedish fracture Register in a well-defined catchment area. PLoS ONE15, e0244291 (2020). - PMC - PubMed
    1. Burge, R. et al. Incidence and economic burden of osteoporosis-related fractures in the United States, 2005–2025. J. Bone Miner. Res.22, 465–475 (2007). - PubMed
    1. Müller, M. et al. The development and validation of a resource consumption score of an emergency department consultation. PLoS ONE16, e0247244 (2021). - PMC - PubMed
    1. Bruls, R. J. M. & Kwee, R. M. Workload for radiologists during on-call hours: Dramatic increase in the past 15 years. Insights Imaging11, 121 (2020). - PMC - PubMed

MeSH terms