Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Dec 1:10.1038/s44387-025-00043-5.
doi: 10.1038/s44387-025-00043-5. Online ahead of print.

SCOPE-MRI: Bankart Lesion Detection as a Case Study in Data Curation and Deep Learning for Challenging Diagnoses

Affiliations

SCOPE-MRI: Bankart Lesion Detection as a Case Study in Data Curation and Deep Learning for Challenging Diagnoses

Sahil Sethi et al. NPJ Artif Intell. .

Abstract

Deep learning has shown strong performance in musculoskeletal imaging, but prior work has largely targeted conditions where diagnosis is relatively straightforward. More challenging problems remain underexplored, such as detecting Bankart lesions (anterior-inferior glenoid labral tears) on standard MRIs. These lesions are difficult to diagnose due to subtle imaging features, often necessitating invasive MRI arthrograms (MRAs). We introduce ScopeMRI, the first publicly available, expert-annotated dataset for shoulder pathologies, and present a deep learning framework for Bankart lesion detection on both standard MRIs and MRAs. ScopeMRI contains shoulder MRIs from patients who underwent arthroscopy, providing ground-truth labels from intraoperative findings, the diagnostic gold standard. Separate models were trained for MRIs and MRAs using CNN- and transformer-based architectures, with predictions ensembled across multiple imaging planes. Our models achieved radiologist-level performance, with accuracy on standard MRIs surpassing radiologists interpreting MRAs. External validation on independent hospital data demonstrated initial generalizability across imaging protocols. By releasing ScopeMRI and a modular codebase for training and evaluation, we aim to accelerate research in musculoskeletal imaging and foster development of datasets and models that address clinically challenging diagnostic tasks.

PubMed Disclaimer

Conflict of interest statement

Conflict of interest/Competing interests: The authors declare no conflicts of interest.

Figures

Figure 1:
Figure 1:
Bankart lesion on standard MRI (left) and MRI arthrogram (right) in the axial view. Images are from the same patient and depict the same tear. White circles reflect annotations identifying the tear, provided by a shoulder/elbow fellowship-trained orthopedic surgeon.
Figure 2:
Figure 2:
Heatmap illustrating the difference in validation AUC (MRNet - ImageNet) across model architectures (AlexNet, Swin Transformer, ViT) and view-modalities (sagittal, axial, coronal) for MRAs and standard MRIs. Positive values indicate higher performance with MRNet pretraining compared to ImageNet pretraining. Each cell represents the AUC difference for the corresponding model and view-modality pair, with results derived from the best-performing hyperparameter set for each model and view-modality. The AUC differences have been scaled by 100 for readability and are presented as percentages.
Figure 3:
Figure 3:
Distribution of receiver operating characteristic (ROC) area under the curve (AUC) values across eight cross-validation splits for each view-modality’s final selected architecture. ROC AUC quantifies the model’s ability to distinguish between classes. Each box shows the interquartile range (IQR, 25th–75th percentile), with whiskers extending to 1.5 times the IQR. The horizontal line within each box represents the median AUC, while green triangles indicate the mean AUC. Black dots depict individual split AUCs, with dots outside the whiskers representing outliers. This visualization demonstrates the model’s performance stability on different the validation folds across sagittal, axial, and coronal views for both standard MRIs and MRI arthrograms (MRAs).
Figure 4:
Figure 4:
Receiver operating characteristic (ROC) curves for single-view models and the multi-view ensemble compared to radiologist performance. Results are shown for (a) internal standard MRIs, (b) MRI arthrograms (MRAs), and (c) external standard MRIs. The single-view models correspond to those included in the multi-view ensemble. Shaded regions around each curve represent 95% confidence intervals, calculated through bootstrapping with 1000 iterations. Radiologist performance is marked with red X symbols, illustrating sensitivity and false positive rates derived from original radiology reports (internal datasets only). The dashed diagonal line indicates the performance of a random classifier (AUC = 0.50).
Figure 5:
Figure 5:
Gradient-weighted class activation mapping (Grad-CAM) visualizations for Bankart lesion detection on MRAs (left) and standard MRIs (right) for the axial view. Cases with and without Bankart lesions are presented. The model correctly classified all four cases. White circles highlight the anterior labrum (the region of interest), annotated by a shoulder/elbow fellowship-trained orthopedic surgeon. Heatmaps indicate regions most influential to the model’s prediction, with warmer colors (red/yellow) signifying higher relevance.
Figure 6:
Figure 6:
Data Collection and Labeling Protocol.
Figure 7:
Figure 7:
Model Training & Inference. (A) Schematic of 2D model training setup using 3D MRIs. (B) Schematic of model multi-view ensemble for inference. The setup for the 3D CNN only differed in that the entire preprocessed MRI volume was input directly into the model, then the output was fed into the classifier & sigmoid layers.

Update of

References

    1. Barnett AJ, Schwartz FR, Tao C, Chen C, Ren Y, Lo JY, and Rudin C, “A case-based interpretable deep learning model for classification of mass lesions in digital mammography,” Nature Machine Intelligence 3, 1061–1070 (Dec. 2021). Publisher: Nature Publishing Group.
    1. Çallı E, Sogancioglu E, van Ginneken B, van Leeuwen KG, and Murphy K, “Deep learning for chest X-ray analysis: A survey,” Medical Image Analysis 72, 102125 (Aug. 2021). - PubMed
    1. Sun R, Li Y, Zhang T, Mao Z, Wu F, and Zhang Y, “Lesion-Aware Transformers for Diabetic Retinopathy Grading,” in [2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)], 10933–10942, IEEE, Nashville, TN, USA (June 2021).
    1. Fritz B and Fritz J, “Artificial intelligence for MRI diagnosis of joints: a scoping review of the current state-of-the-art of deep learning-based approaches,” Skeletal Radiology 51, 315–329 (Feb. 2022). - PMC - PubMed
    1. Zhang L, Li M, Zhou Y, Lu G, and Zhou Q, “Deep Learning Approach for Anterior Cruciate Ligament Lesion Detection: Evaluation of Diagnostic Performance Using Arthroscopy as the Reference Standard,” Journal of Magnetic Resonance Imaging 52(6), 1745–1752 (2020). _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/jmri.27266. - DOI - PubMed

LinkOut - more resources