Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2025 Apr 29:arXiv:2504.20405v1.

SCOPE-MRI: Bankart Lesion Detection as a Case Study in Data Curation and Deep Learning for Challenging Diagnoses

Affiliations

SCOPE-MRI: Bankart Lesion Detection as a Case Study in Data Curation and Deep Learning for Challenging Diagnoses

Sahil Sethi et al. ArXiv. .

Abstract

While deep learning has shown strong performance in musculoskeletal imaging, existing work has largely focused on pathologies where diagnosis is not a clinical challenge, leaving more difficult problems underexplored-such as detecting Bankart lesions (anterior-inferior glenoid labral tears) on standard MRIs. Diagnosing these lesions is challenging due to their subtle imaging features, often leading to reliance on invasive MRI arthrograms (MRAs). This study introduces ScopeMRI, the first publicly available, expert-annotated dataset for shoulder pathologies, and presents a deep learning (DL) framework for detecting Bankart lesions on both standard MRIs and MRAs. ScopeMRI includes 586 shoulder MRIs (335 standard, 251 MRAs) from 558 patients who underwent arthroscopy. Ground truth labels were derived from intraoperative findings, the gold standard for diagnosis. Separate DL models for MRAs and standard MRIs were trained using a combination of CNNs and transformers, pre-trained on a public knee MRI dataset. Predictions from sagittal, axial, and coronal views were ensembled to optimize performance. The models were evaluated on a 20% hold-out test set (117 MRIs: 46 MRAs, 71 standard MRIs). The models achieved an AUC of 0.91 and 0.93, sensitivity of 83% and 94%, and specificity of 91% and 86% for standard MRIs and MRAs, respectively. Notably, model performance on non-invasive standard MRIs matched or surpassed radiologists interpreting MRAs. External validation on independent hospital data demonstrated initial generalizability across imaging protocols. This study demonstrates that DL models can achieve radiologist-level diagnostic performance on standard MRIs, reducing the need for invasive MRAs. By releasing ScopeMRI and a modular codebase for training and evaluating deep learning models on 3D medical imaging data, we aim to accelerate research in musculoskeletal imaging and support the development of new datasets for clinically challenging diagnostic tasks.

Keywords: Bankart Lesion; Computer-Aided Diagnosis; Deep Learning; Glenoid Labrum; Labral Tear; Magnetic Resonance Imaging (MRI); Medical Imaging; Orthopedic Surgery.

PubMed Disclaimer

Conflict of interest statement

Conflict of interest/Competing interests: The authors declare no conflicts of interest.

Figures

Figure 1:
Figure 1:
Bankart lesion on standard MRI (left) and MRI arthrogram (right) in the axial view. Images are from the same patient and depict the same tear. White circles reflect annotations identifying the tear, provided by a shoulder/elbow fellowship-trained orthopedic surgeon.
Figure 2:
Figure 2:
Data Collection and Labeling Protocol.
Figure 3:
Figure 3:
Model Training & Inference. The setup for the 3D CNN only differed in that the entire preprocessed MRI volume was input directly into the model, then the output was fed into the classifier & sigmoid layers.
Figure 4:
Figure 4:
Heatmap illustrating the difference in validation AUC (MRNet - ImageNet) across model architectures (AlexNet, Swin Transformer, ViT) and view-modalities (sagittal, axial, coronal) for MRAs and standard MRIs. Positive values indicate higher performance with MRNet pretraining compared to ImageNet pretraining. Each cell represents the AUC difference for the corresponding model and view-modality pair, with results derived from the best-performing hyperparameter set for each model and view-modality. The AUC differences have been scaled by 100 for readability and are presented as percentages.
Figure 5:
Figure 5:
Distribution of receiver operating characteristic (ROC) area under the curve (AUC) values across eight cross-validation splits for each view-modality’s final selected architecture. ROC AUC quantifies the model’s ability to distinguish between classes. Each box shows the interquartile range (IQR, 25th–75th percentile), with whiskers extending to 1.5 times the IQR. The horizontal line within each box represents the median AUC, while green triangles indicate the mean AUC. Black dots depict individual split AUCs, with dots outside the whiskers representing outliers. This visualization demonstrates the model’s performance stability on the hold-out test set across sagittal, axial, and coronal views for both standard MRIs and MRI arthrograms (MRAs).
Figure 6:
Figure 6:
Receiver operating characteristic (ROC) curves for single-view models and the multi-view ensemble, compared to radiologist performance on (a) internal standard MRIs, (b) MRI arthrograms (MRAs), and (c) external standard MRIs. The single-view models correspond to those included in the multi-view ensemble. Shaded regions around each curve represent 95% confidence intervals, calculated through bootstrapping with 1000 iterations. Radiologist performance is marked with red x symbols, illustrating sensitivity and false positive rates derived from original radiology reports (internal datasets only). The dashed diagonal line indicates the performance of a random classifier (AUC = 0.50).
Figure 7:
Figure 7:
Gradient-weighted class activation mapping (Grad-CAM) visualizations for Bankart lesion detection on MRAs (left) and standard MRIs (right) for the axial view. Cases with and without Bankart lesions are presented. The model correctly classified all four cases. White circles highlight the anterior labrum (the region of interest), annotated by a shoulder/elbow fellowship-trained orthopedic surgeon. Heatmaps indicate regions most influential to the model’s prediction, with warmer colors (red/yellow) signifying higher relevance.

Similar articles

References

    1. Barnett A. J., Schwartz F. R., Tao C., Chen C., Ren Y., Lo J. Y., and Rudin C., “A case-based interpretable deep learning model for classification of mass lesions in digital mammography,” Nature Machine Intelligence 3, 1061–1070 (Dec. 2021). Publisher: Nature Publishing Group.
    1. Çallı E., Sogancioglu E., van Ginneken B., van Leeuwen K. G., and Murphy K., “Deep learning for chest X-ray analysis: A survey,” Medical Image Analysis 72, 102125 (Aug. 2021). - PubMed
    1. Sun R., Li Y., Zhang T., Mao Z., Wu F., and Zhang Y., “Lesion-Aware Transformers for Diabetic Retinopathy Grading,” in [2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)], 10933–10942, IEEE, Nashville, TN, USA: (June 2021).
    1. Fritz B. and Fritz J., “Artificial intelligence for MRI diagnosis of joints: a scoping review of the current state-of-the-art of deep learning-based approaches,” Skeletal Radiology 51, 315–329 (Feb. 2022). - PMC - PubMed
    1. Zhang L., Li M., Zhou Y., Lu G., and Zhou Q., “Deep Learning Approach for Anterior Cruciate Ligament Lesion Detection: Evaluation of Diagnostic Performance Using Arthroscopy as the Reference Standard,” Journal of Magnetic Resonance Imaging 52(6), 1745–1752 (2020). _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/jmri.27266. - DOI - PubMed

Publication types

LinkOut - more resources