Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jun;16(3):e13815.
doi: 10.1002/jcsm.13815.

Myo-Guide: A Machine Learning-Based Web Application for Neuromuscular Disease Diagnosis With MRI

Affiliations

Myo-Guide: A Machine Learning-Based Web Application for Neuromuscular Disease Diagnosis With MRI

Jose Verdu-Diaz et al. J Cachexia Sarcopenia Muscle. 2025 Jun.

Abstract

Background: Neuromuscular diseases (NMDs) are rare disorders characterized by progressive muscle fibre loss, leading to replacement by fibrotic and fatty tissue, muscle weakness and disability. Early diagnosis is critical for therapeutic decisions, care planning and genetic counselling. Muscle magnetic resonance imaging (MRI) has emerged as a valuable diagnostic tool by identifying characteristic patterns of muscle involvement. However, the increasing complexity of these patterns complicates their interpretation, limiting their clinical utility. Additionally, multi-study data aggregation introduces heterogeneity challenges. This study presents a novel multi-study harmonization pipeline for muscle MRI and an AI-driven diagnostic tool to assist clinicians in identifying disease-specific muscle involvement patterns.

Methods: We developed a preprocessing pipeline to standardize MRI fat content across datasets, minimizing source bias. An ensemble of XGBoost models was trained to classify patients based on intramuscular fat replacement, age at MRI and sex. The SHapley Additive exPlanations (SHAP) framework was adapted to analyse model predictions and identify disease-specific muscle involvement patterns. To address class imbalance, training and evaluation were conducted using class-balanced metrics. The model's performance was compared against four expert clinicians using 14 previously unseen MRI scans.

Results: Using our harmonization approach, we curated a dataset of 2961 MRI samples from genetically confirmed cases of 20 paediatric and adult NMDs. The model achieved a balanced accuracy of 64.8% ± 3.4%, with a weighted top-3 accuracy of 84.7% ± 1.8% and top-5 accuracy of 90.2% ± 2.4%. It also identified key features relevant for differential diagnosis, aiding clinical decision-making. Compared to four expert clinicians, the model obtained the highest top-3 accuracy (75.0% ± 4.8%). The diagnostic tool has been implemented as a free web platform, providing global access to the medical community.

Conclusions: The application of AI in muscle MRI for NMD diagnosis remains underexplored due to data scarcity. This study introduces a framework for dataset harmonization, enabling advanced computational techniques. Our findings demonstrate the potential of AI-based approaches to enhance differential diagnosis by identifying disease-specific muscle involvement patterns. The developed tool surpasses expert performance in diagnostic ranking and is accessible to clinicians worldwide via the Myo-Guide online platform.

Keywords: MRI; artificial intelligence; differential diagnosis; machine learning; neuromuscular diseases.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflicts of interest.

Figures

FIGURE 1
FIGURE 1
(a) Representation of the different muscle fat scales found in the dataset. Each row represents a different scale, and each coloured circle represents a possible value. The fat fraction scale is continuous, and the rest are discrete. Discrete scales are projected over the FF scale to show differences in muscle fat quantisation. (b) MRI examples of healthy muscle, (c) intermediate‐stage muscle and (d) late‐stage muscle. Fatty tissue appears with high intensity in the MRIs (white/light grey), whereas muscle tissue appears with low intensity (dark grey). The top row of the MRIs represents a pelvis/waist‐level axial slice, the second row represents a thigh axial slice, and the last row represents a calf axial slice.
FIGURE 2
FIGURE 2
Distribution of muscle fat scores before (a) and after (b) processing for CAPN3, DMD and SMN1. Normalized stacked densities are shown for each different scale. The discrete values shown in Figure 1a are highlighted in the left figure. Scores outside of the discretised values correspond to mean values from the left and right legs. (c) Heatmap of the data for SarcoG. Patient samples are shown in rows and features in columns. Rows are sorted by mean fat score, with late‐stage patients in the upper rows and early‐stage patients in the lower rows. Asymmetry is calculated as the difference between each left and right muscle, and the mean and standard deviation of the asymmetry of all muscles are added as features for each patient. Empty (white) spaces represent missing data. Muscle abbreviations: biceps femoris long head (bflh), biceps femoris short head (bfsh), flexor hallucis longus (fhl) and flexor digitorum longus (fdl). The extensor digitorum longus and extensor hallucis longus have been grouped and named ‘extensors’.
FIGURE 3
FIGURE 3
Evaluation plots for the XGBoost model ensemble. (a) Confusion matrix of the model with the test data. The ground truth is shown in rows and the model predictions are in columns. The diagonal corresponds to the correctly predicted samples. (b) Confusion matrix normalized by ground truth (rows). (c) One‐vs‐Rest receiving operating curves for each disease (blue) and micro‐ and macro‐averaged curves (red). The area under the curve for the micro‐ and macro‐averaged curves is shown in the legend. (d) One‐vs‐Rest precision–recall curves for each disease (blue) and micro‐ and macro‐averaged curves (red). The area under the precision–recall curve for the micro‐ and macro‐averaged curves is shown in the legend.
FIGURE 4
FIGURE 4
Average One‐vs‐One precision–recall curves. The areas under the precision–recall curves (AUPRC) are represented for each pair of diseases. Histograms of the mean fat score for each NMD are shown in the diagonal.
FIGURE 5
FIGURE 5
Model explainability using SHAP values. (a) Clustered mean absolute SHAP value. Features are shown in columns and diseases are shown in rows. The absolute SHAP value gives an overall indication of the importance of each feature. (b) SHAP values of the 10 most important features in predicting SarcoG, with each patient represented as a dot. Positive SHAP values indicate a positive impact on the prediction (increase in odds of predicting the target disease) and vice versa. Feature values are colour‐coded: ‘High’ is equivalent to the maximum feature value, and ‘low’ is equivalent to the minimum feature value. (c) Clustered mean absolute interaction values for SarcoG. The main effect values (diagonal) have been set to 0 to avoid obscuring the interaction values. (d) Interaction values between age and soleus for SarcoG, showing all patients. (e) Interaction values between age and soleus for SarcoG, only showing patients diagnosed with SarcoG (ground truth). Age is normalized to a −100–100 range.
FIGURE 6
FIGURE 6
(a) Top‐K accuracy curves of the model (Myo‐Guide) and experts in the final AI versus experts' experiment. Error bars are available for the model, representing the standard deviation of the model ensemble. The Top‐K accuracy curve of a random classifier is also provided. (b) Confusion matrix of the AI versus experts' experiment, with each answer represented as a bar. Each expert (and model) is colour‐coded, and the ranking of each choice is represented by the length of the bar. The predicted diagnoses not included in the test set (TTN, HypoPP, SMN1, OPDM, PYGM and CLCN1) are separated from the rest for visual clarity. Bars in the diagonal represent correct predictions and vice versa. (c) mMRI scan of P3 showing thigh and lower leg. (d) Waterfall plot of the SHAP values for P3 when predicting GNE. The y‐axis represents the features (with values) sorted by decreasing importance (top to bottom). The x‐axis represents the raw output of the model (in logits). The plot shows the impact each feature had towards the model output. Note that all feature values are scaled to a range of −100 and 100 (including age and asymmetry).

References

    1. Mercuri E. and Muntoni F., “Muscular Dystrophies,” Lancet 381, no. 9869 (2013): 845–860. - PubMed
    1. Nuñez‐Peralta C., Alonso‐Pérez J., and Díaz‐Manera J., “The Increasing Role of Muscle MRI to Monitor Changes Over Time in Untreated and Treated Muscle Diseases,” Current Opinion in Neurology 33, no. 5 (2020): 611–620. - PubMed
    1. Dahlqvist J. R., Widholm P., Leinhard O. D., and Vissing J., “MRI in Neuromuscular Diseases: An Emerging Diagnostic Tool and Biomarker for Prognosis and Efficacy,” Annals of Neurology 88, no. 4 (2020): 669–681. - PubMed
    1. Burakiewicz J., Sinclair C. D. J., Fischer D., Walter G. A., Kan H. E., and Hollingsworth K. G., “Quantifying Fat Replacement of Muscle by Quantitative MRI in Muscular Dystrophy,” Journal of Neurology 264, no. 10 (2017): 2053–2067. - PMC - PubMed
    1. Pezeshk P., Alian A., and Chhabra A., “Role of Chemical Shift and Dixon Based Techniques in Musculoskeletal MR Imaging,” European Journal of Radiology 94 (2017): 93–100. - PubMed