. 2019 Nov;477(11):2482-2491.

doi: 10.1097/CORR.0000000000000848.

What Are the Applications and Limitations of Artificial Intelligence for Fracture Detection and Classification in Orthopaedic Trauma Imaging? A Systematic Review

David W G Langerhuizen¹, Stein J Janssen, Wouter H Mallee, Michel P J van den Bekerom, David Ring, Gino M M J Kerkhoffs, Ruurd L Jaarsma, Job N Doornberg

Affiliations

Affiliation

¹ D. W. G. Langerhuizen, R. L. Jaarsma, J. N. Doornberg, Flinders University, Department of Orthopaedic and Trauma Surgery, Flinders Medical Centre, Adelaide, Australia S. J. Janssen, Department of Orthopaedic Surgery, Amphia Hospital, Breda, the Netherlands W. H. Mallee, G. M. M. J. Kerkhoffs, Department of Orthopaedic Surgery, Amsterdam Movement Sciences, Amsterdam University Medical Centre, Amsterdam, the Netherlands M. P. J. van den Bekerom, Department of Orthopaedic Surgery, Onze Lieve Vrouwe Gasthuis, Amsterdam, the Netherlands D. Ring, Department of Surgery and Perioperative Care, Dell Medical School, the University of Texas at Austin, Austin, TX, USA.

PMID: 31283727
PMCID: PMC6903838
DOI: 10.1097/CORR.0000000000000848

What Are the Applications and Limitations of Artificial Intelligence for Fracture Detection and Classification in Orthopaedic Trauma Imaging? A Systematic Review

David W G Langerhuizen et al. Clin Orthop Relat Res. 2019 Nov.

. 2019 Nov;477(11):2482-2491.

doi: 10.1097/CORR.0000000000000848.

Authors

David W G Langerhuizen¹, Stein J Janssen, Wouter H Mallee, Michel P J van den Bekerom, David Ring, Gino M M J Kerkhoffs, Ruurd L Jaarsma, Job N Doornberg

Affiliation

¹ D. W. G. Langerhuizen, R. L. Jaarsma, J. N. Doornberg, Flinders University, Department of Orthopaedic and Trauma Surgery, Flinders Medical Centre, Adelaide, Australia S. J. Janssen, Department of Orthopaedic Surgery, Amphia Hospital, Breda, the Netherlands W. H. Mallee, G. M. M. J. Kerkhoffs, Department of Orthopaedic Surgery, Amsterdam Movement Sciences, Amsterdam University Medical Centre, Amsterdam, the Netherlands M. P. J. van den Bekerom, Department of Orthopaedic Surgery, Onze Lieve Vrouwe Gasthuis, Amsterdam, the Netherlands D. Ring, Department of Surgery and Perioperative Care, Dell Medical School, the University of Texas at Austin, Austin, TX, USA.

PMID: 31283727
PMCID: PMC6903838
DOI: 10.1097/CORR.0000000000000848

Abstract

Background: Artificial-intelligence algorithms derive rules and patterns from large amounts of data to calculate the probabilities of various outcomes using new sets of similar data. In medicine, artificial intelligence (AI) has been applied primarily to image-recognition diagnostic tasks and evaluating the probabilities of particular outcomes after treatment. However, the performance and limitations of AI in the automated detection and classification of fractures has not been examined comprehensively.

Question/purposes: In this systematic review, we asked (1) What is the proportion of correctly detected or classified fractures and the area under the receiving operating characteristic (AUC) curve of AI fracture detection and classification models? (2) What is the performance of AI in this setting compared with the performance of human examiners?

Methods: The PubMed, Embase, and Cochrane databases were systematically searched from the start of each respective database until September 6, 2018, using terms related to "fracture", "artificial intelligence", and "detection, prediction, or evaluation." Of 1221 identified studies, we retained 10 studies: eight studies involved fracture detection (ankle, hand, hip, spine, wrist, and ulna), one addressed fracture classification (diaphyseal femur), and one addressed both fracture detection and classification (proximal humerus). We registered the review before data collection (PROSPERO: CRD42018110167) and used the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA). We reported the range of the accuracy and AUC for the performance of the predicted fracture detection and/or classification task. An AUC of 1.0 would indicate perfect prediction, whereas 0.5 would indicate a prediction is no better than a flip-of-a-coin. We conducted quality assessment using a seven-item checklist based on a modified methodologic index for nonrandomized studies instrument (MINORS).

Results: For fracture detection, the AUC in five studies reflected near perfect prediction (range, 0.95-1.0), and the accuracy in seven studies ranged from 83% to 98%. For fracture classification, the AUC was 0.94 in one study, and the accuracy in two studies ranged from 77% to 90%. In two studies AI outperformed human examiners for detecting and classifying hip and proximal humerus fractures, and one study showed equivalent performance for detecting wrist, hand and ankle fractures.

Conclusions: Preliminary experience with fracture detection and classification using AI shows promising performance. AI may enhance processing and communicating probabilistic tasks in medicine, including orthopaedic surgery. At present, inadequate reference standard assignments to train and test AI is the biggest hurdle before integration into clinical workflow. The next step will be to apply AI to more challenging diagnostic and therapeutic scenarios when there is absence of certitude. Future studies should also seek to address legal regulation and better determine feasibility of implementation in clinical practice.

Level of evidence: Level II, diagnostic study.

PubMed Disclaimer

Conflict of interest statement

All ICMJE Conflict of Interest Forms for authors and Clinical Orthopaedics and Related Research® editors and board members are on file with the publication and can be viewed on request.

Figures

**Fig. 1 A-E**
Two common AI techniques exist. Supervised learning applies to iterative training of an algorithm with a dataset consisting of input features with ground truth labels. For example, radiographs of the wrist are provided as input features labelled fracture and no fracture. By providing new wrist radiographs without a label, the algorithm learns to make a prediction of both classes on its own. Unsupervised learning applies to data exposure without ground truth labels. During the training phase, the algorithm attempts to find labels that best organize the data (“clustering”). Generally, unsupervised learning requires more computational power and larger datasets, and its performance is more challenging to evaluate. Therefore, supervised algorithms are often used in medical applications. **(A)** Neural networks are based on interconnected neurons in the human brain. The blue dots represent input features, whereas the red dots are the output of the algorithm. The green dots mathematically weigh the input features to predict an output. **(B)** A support vector machine is used to define an optimal separating “hyperplane” to maximize the distance from the closest points of two classes. **(C)** A linear discriminant analysis is a linear classification technique to distinguish among three or more classes. **(D)** K-nearest neighbors classify an input feature by a majority vote of its K-closest neighbors. For instance, the unknown dot will be assigned blue if K = 1 (inner circle), whereas the unknown dot will be assigned red if K = 5 (outer circle). **(E)** K-means groups objects based on their characteristics by iteratively aggregating clusters to centroids by minimizing the distance to the middle point of the cluster. For example, three clusters are aggregated (K = 3): green, red, and blue dots.

**Fig. 2**
This flowchart depicts the study selection during screening and inclusion of articles for a search period from start of each initial database to September 6, 2018.

**Fig. 3**
We conducted a quality assessment of included studies using a seven-item checklist based on a modified methodologic index for nonrandomized studies (MINORS) instrument.

See this image and copyright information in PMC

Comment in

CORR Insights®: What Are the Applications and Limitations of Artificial Intelligence for Fracture Detection and Classification in Orthopaedic Trauma Imaging? A Systematic Review.
Michelson JD. Michelson JD. Clin Orthop Relat Res. 2019 Nov;477(11):2492-2494. doi: 10.1097/CORR.0000000000000912. Clin Orthop Relat Res. 2019. PMID: 31369435 Free PMC article. No abstract available.

References

1. Al-Helo S, Alomari RS, Ghosh S, Chaudhary V, Dhillon G, Al-Zoubi MB, Hiary H, Hamtini TM. Compression fracture diagnosis in lumbar: a clinical CAD system. Int J Comput Assist Radiol Surg. 2013;8:461-469. - PubMed
1. Basha CMAKZ, Padmaja M, Balaji GN. Computer Aided Fracture Detection System. J Med Imaging Health Inform. 2018;8:526-531.
1. Bayram F, Çakiroğlu M. DIFFRACT: DIaphyseal Femur FRActure Classifier SysTem. Biocybern Biomed Eng. 2016;36:157-171.
1. Carofino BC, Leopold SS. Classifications in brief: the Neer classification for proximal humerus fractures. Clin Orthop Relat Res. 2013;471:39-43. - PMC - PubMed
1. Chung SW, Han SS, Lee JW, Oh KS, Kim NR, Yoon JP, Kim JY, Moon SH, Kwon J, Lee HJ, Noh YM, Kim Y. Automated detection and classification of the proximal humerus fracture by using deep learning algorithm. Acta Orthop. 2018;89:468-473. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Medical
- ClinicalTrials.gov
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

What Are the Applications and Limitations of Artificial Intelligence for Fracture Detection and Classification in Orthopaedic Trauma Imaging? A Systematic Review

Affiliation

What Are the Applications and Limitations of Artificial Intelligence for Fracture Detection and Classification in Orthopaedic Trauma Imaging? A Systematic Review

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

Comment in

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Medical