Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Nov 27;15(11):e1002699.
doi: 10.1371/journal.pmed.1002699. eCollection 2018 Nov.

Deep-learning-assisted diagnosis for knee magnetic resonance imaging: Development and retrospective validation of MRNet

Affiliations

Deep-learning-assisted diagnosis for knee magnetic resonance imaging: Development and retrospective validation of MRNet

Nicholas Bien et al. PLoS Med. .

Abstract

Background: Magnetic resonance imaging (MRI) of the knee is the preferred method for diagnosing knee injuries. However, interpretation of knee MRI is time-intensive and subject to diagnostic error and variability. An automated system for interpreting knee MRI could prioritize high-risk patients and assist clinicians in making diagnoses. Deep learning methods, in being able to automatically learn layers of features, are well suited for modeling the complex relationships between medical images and their interpretations. In this study we developed a deep learning model for detecting general abnormalities and specific diagnoses (anterior cruciate ligament [ACL] tears and meniscal tears) on knee MRI exams. We then measured the effect of providing the model's predictions to clinical experts during interpretation.

Methods and findings: Our dataset consisted of 1,370 knee MRI exams performed at Stanford University Medical Center between January 1, 2001, and December 31, 2012 (mean age 38.0 years; 569 [41.5%] female patients). The majority vote of 3 musculoskeletal radiologists established reference standard labels on an internal validation set of 120 exams. We developed MRNet, a convolutional neural network for classifying MRI series and combined predictions from 3 series per exam using logistic regression. In detecting abnormalities, ACL tears, and meniscal tears, this model achieved area under the receiver operating characteristic curve (AUC) values of 0.937 (95% CI 0.895, 0.980), 0.965 (95% CI 0.938, 0.993), and 0.847 (95% CI 0.780, 0.914), respectively, on the internal validation set. We also obtained a public dataset of 917 exams with sagittal T1-weighted series and labels for ACL injury from Clinical Hospital Centre Rijeka, Croatia. On the external validation set of 183 exams, the MRNet trained on Stanford sagittal T2-weighted series achieved an AUC of 0.824 (95% CI 0.757, 0.892) in the detection of ACL injuries with no additional training, while an MRNet trained on the rest of the external data achieved an AUC of 0.911 (95% CI 0.864, 0.958). We additionally measured the specificity, sensitivity, and accuracy of 9 clinical experts (7 board-certified general radiologists and 2 orthopedic surgeons) on the internal validation set both with and without model assistance. Using a 2-sided Pearson's chi-squared test with adjustment for multiple comparisons, we found no significant differences between the performance of the model and that of unassisted general radiologists in detecting abnormalities. General radiologists achieved significantly higher sensitivity in detecting ACL tears (p-value = 0.002; q-value = 0.019) and significantly higher specificity in detecting meniscal tears (p-value = 0.003; q-value = 0.019). Using a 1-tailed t test on the change in performance metrics, we found that providing model predictions significantly increased clinical experts' specificity in identifying ACL tears (p-value < 0.001; q-value = 0.006). The primary limitations of our study include lack of surgical ground truth and the small size of the panel of clinical experts.

Conclusions: Our deep learning model can rapidly generate accurate clinical pathology classifications of knee MRI exams from both internal and external datasets. Moreover, our results support the assertion that deep learning models can improve the performance of clinical experts during medical imaging interpretation. Further research is needed to validate the model prospectively and to determine its utility in the clinical setting.

PubMed Disclaimer

Conflict of interest statement

I have read the journal's policy and the authors of this manuscript have the following competing interests: CL is a shareholder of whiterabbit.ai and nines.ai. Since submitting this manuscript, RLB has joined and received stock options from Roam Analytics, whose mission is to use AI methodology to improve human health.

Figures

Fig 1
Fig 1. Experimental setup flowchart.
We retrospectively collected a dataset of 1,370 knee MRI examinations used to develop the model and to assess the model and clinical experts. Labels were prospectively obtained through manual extraction from clinical reports. Images were extracted from DICOM files, preprocessed, then linked to reports. The dataset was split into a training set (to develop the model), a tuning set (to choose among models), and a validation set (to assess the best model and clinical experts). The validation set DICOMs correspond to the same exams as the validation set, but the images in the validation set were preprocessed before input to the model. These validation exams were independently annotated by musculoskeletal (MSK) radiologists (MSK specialists), model-unassisted clinical experts, and model-assisted clinical experts.
Fig 2
Fig 2. MRNet architecture.
The MRNet is a convolutional neural network (CNN) that takes as input a series of MRI images and outputs a classification prediction. AlexNet features from each slice of the MRI series are combined using a max pooling (element-wise maximum) operation. The resulting vector is fed through a fully connected layer to produce a single output probability. We trained a different MRNet for each task (abnormality, anterior cruciate ligament [ACL] tear, meniscal tear) and series type (sagittal, coronal, axial), resulting in 9 different MRNets (for external validation, we use only the sagittal plane ACL tear MRNet). For each model, the output probability represents the probability that the model assigns to the series for the presence of the diagnosis.
Fig 3
Fig 3. Class activation mappings for MRNet interpretation.
Class activation mappings (CAMs) highlight which pixels in the images are important for the model’s classification decision. One of the board-certified musculoskeletal radiologists annotated the images (white arrows and circles) and provided the following captions. (a) Sagittal T2-weighted image of the knee demonstrating large effusion (arrow) and rupture of the gastrocnemius tendon (ring), which were correctly localized by the model and classified as abnormal. Note that the model was not specifically trained to detect these pathologies but was able to recognize the abnormalities based on the contrast with the normal knee examinations. (b) Sagittal T2-weighted image of the knee complicated by a significant motion artifact demonstrating complete anterior cruciate ligament (ACL) tear (arrow), which was correctly classified and localized by the model. Because we hoped to best approximate the clinical practice reality—in which the prevalence of artifacts (i.e. motion, metallic) and other technical noise disrupts interpretation of knee MRI—we did not exclude noisy cases from the training or validation data. (c) Sagittal T2-weighted image of the knee demonstrating complete disruption of the ACL, which was correctly identified by the model as abnormal and classified as ACL tear. The CAM indicates the focus of the model at the abnormal attachment of the ACL (arrow). (d) Sagittal T2-weighted image of the knee demonstrating a complex tear involving the posterior horn of the lateral meniscus (arrow). While the model did classify this examination as abnormal, the CAM indicates that the increased subcutaneous signal (ring) in the anterior/lateral soft tissues contributed to the decision but the meniscal tear did not.
Fig 4
Fig 4. Combining series predictions using logistic regression.
Each examination contains 3 types of series: sagittal, coronal, and axial. For each task (abnormality, ACL tear, meniscal tear), we trained a logistic regression classifier to combine the 3 probabilities output by the MRNets to produce a single predicted probability for the exam. The predicted probabilities from an exam in the internal validation set are shown as an example.ACL, anterior cruciate ligament; CNN, convolutional neural network; LR, logistic regression.
Fig 5
Fig 5. Receiver operating characteristic curves of the model and operating points of unassisted and assisted clinical experts.
Each plot illustrates the receiver operating characteristic (ROC) curve of the algorithm (black curve) on the validation set for (a) abnormality, (b) anterior cruciate ligament (ACL) tear, and (c) meniscus tear. The ROC curve is generated by varying the discrimination threshold (used to convert the output probabilities to binary predictions). Individual clinical expert (specificity, sensitivity) points are also plotted, where the red x’s represent model-unassisted general radiologists, the orange x’s represent model-unassisted orthopedic surgeons, the green plus signs represent model-assisted general radiologists, and the blue plus signs represent model-assisted orthopedic surgeons. We also plot the macro-average of the model-unassisted clinical experts (black x’s) and the macro-average of the model-assisted clinical experts (black plus signs). Each unassisted clinical expert operating point is connected to its corresponding model-assisted operating point with a dashed line.
Fig 6
Fig 6. Comparison of unassisted and model-assisted performance metrics of clinical experts on the validation set.
Mean differences (with 95% CI error bars) in clinical experts’ performance metrics (model-assisted minus unassisted) for abnormality, anterior cruciate ligament (ACL) tear, and meniscal tear detection. Numerical values are provided in Table 3, and individual values provided in S2 Table.

Similar articles

Cited by

References

    1. Nacey NC, Geeslin MG, Miller GW, Pierce JL. Magnetic resonance imaging of the knee: an overview and update of conventional and state of the art imaging. J Magn Reson Imaging. 2017;45:1257–75. 10.1002/jmri.25620 - DOI - PubMed
    1. Naraghi AM, White LM. Imaging of athletic injuries of knee ligaments and menisci: sports imaging series. Radiology. 2016;281:23–40. 10.1148/radiol.2016152320 - DOI - PubMed
    1. Helms CA. Magnetic resonance imaging of the knee In: Brant WE, Helms CA, editors. Fundamentals of diagnostic radiology. Philadelphia: Lippincott Williams & Wilkins; 2007. pp. 1193–204.
    1. Oei EH, Nikken JJ, Verstijnen AC, Ginai AZ, Myriam Hunink MG. MR Imaging of the menisci and cruciate ligaments: a systematic review. Radiology. 2003;226:837–48. 10.1148/radiol.2263011892 - DOI - PubMed
    1. Rangger C, Klestil T, Kathrein A, Inderster A, Hamid L. Influence of magnetic resonance imaging on indications for arthroscopy of the knee. Clin Orthop Relat Res. 1996;330:133–42. - PubMed

Publication types