Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2025 May 23:2025.05.19.25327707.
doi: 10.1101/2025.05.19.25327707.

FLAMeS: A Robust Deep Learning Model for Automated Multiple Sclerosis Lesion Segmentation

Affiliations

FLAMeS: A Robust Deep Learning Model for Automated Multiple Sclerosis Lesion Segmentation

Emma Dereskewicz et al. medRxiv. .

Abstract

Background and purpose: Assessment of brain lesions on MRI is crucial for research in multiple sclerosis (MS). Manual segmentation is time consuming and inconsistent. We aimed to develop an automated MS lesion segmentation algorithm for T2-weighted fluid-attenuated inversion recovery (FLAIR) MRI.

Methods: We developed FLAIR Lesion Analysis in Multiple Sclerosis (FLAMeS), a deep learning-based MS lesion segmentation algorithm based on the nnU-Net 3D full-resolution U-Net and trained on 668 FLAIR 1.5 and 3 tesla scans from persons with MS. FLAMeS was evaluated on three external datasets: MSSEG-2 (n=14), MSLesSeg (n=51), and a clinical cohort (n=10), and compared to SAMSEG, LST-LPA, and LST-AI. Performance was assessed qualitatively by two blinded experts and quantitatively by comparing automated and ground truth lesion masks using standard segmentation metrics.

Results: In a blinded qualitative review of 20 scans, both raters selected FLAMeS as the most accurate segmentation in 15 cases, with one rater favoring FLAMeS in two additional cases. Across all testing datasets, FLAMeS achieved a mean Dice score of 0.74, a true positive rate of 0.84, and an F1 score of 0.78, consistently outperforming the benchmark methods. For other metrics, including positive predictive value, relative volume difference, and false positive rate, FLAMeS performed similarly or better than benchmark methods. Most lesions missed by FLAMeS were smaller than 10 mm3, whereas the benchmark methods missed larger lesions in addition to smaller ones.

Conclusions: FLAMeS is an accurate, robust method for MS lesion segmentation that outperforms other publicly available methods.

PubMed Disclaimer

Conflict of interest statement

Dr. Reich has received research funding from Abata and Sanofi, unrelated to this paper. The other authors declare no competing financial interests.

Figures

Figure 1.
Figure 1.. FLAMeS generates qualitatively superior lesion masks.
The top row shows an axial FLAIR slice and the ground truth lesion mask. The bottom row displays the lesion masks from each method overlaid on the FLAIR image. Multiple lesions are missed (yellow arrows), undersegmented (blue arrows), or falsely segmented (orange arrow) by LST-AI and SAMSEG.
Figure 2.
Figure 2.. Assessment of false-positive voxels and apparently false-positive lesions identified by automated methods.
(A) Oversegmentation of diffusely abnormal white matter surrounding lesions by FLAMeS and SAMSEG. (B) Example of a periventricular hyperintensity segmented by all three automated methods but not present on the MSLesSeg ground truth mask (yellow arrow). On manual review, this area was judged to be a missed lesion on the original ground truth masks.
Figure 3.
Figure 3.. FLAMeS outperforms SAMSEG and LST-AI in DSC and LTPR.
Box plots display the median (center line), interquartile range (IQR, box edges), and whiskers extending up to 1.5 × IQR.
Figure 4.
Figure 4.. Lesion detection rates by lesion size for each segmentation method.
(A) Shows the distribution of lesion volumes and detection rates by lesion size across all three testing datasets, comparing FLAMeS to SAMSEG. (B) Focuses specifically on the MSLesSeg and clinical testing datasets, showing lesion volume distribution and detection rates for these subsets, comparing FLAMeS to LST-AI.
Figure 5.
Figure 5.
FLAMeS interactive web app.

Similar articles

References

    1. Reich DS, Lucchinetti CF, Calabresi PA. Multiple Sclerosis. Longo DL, ed. N Engl J Med. 2018;378(2):169–180. doi: 10.1056/NEJMra1401483 - DOI - PMC - PubMed
    1. Wattjes MP, Ciccarelli O, Reich DS, et al. 2021 MAGNIMS–CMSC–NAIMS consensus recommendations on the use of MRI in patients with multiple sclerosis. Lancet Neurol. 2021;20(8):653–670. doi: 10.1016/S1474-4422(21)00095-8 - DOI - PubMed
    1. Calabresi PA, Kieseier BC, Arnold DL, et al. Pegylated interferon beta-1a for relapsing-remitting multiple sclerosis (ADVANCE): a randomised, phase 3, double-blind study. Lancet Neurol. 2014;13(7):657–665. doi: 10.1016/S1474-4422(14)70068-7 - DOI - PubMed
    1. Rovira À, Wattjes MP, Tintoré M, et al. MAGNIMS consensus guidelines on the use of MRI in multiple sclerosis—clinical implementation in the diagnostic process. Nat Rev Neurol. 2015;11(8):471–482. doi: 10.1038/nrneurol.2015.106 - DOI - PubMed
    1. La Rosa F, Wynen M, Al-Louzi O, et al. Cortical lesions, central vein sign, and paramagnetic rim lesions in multiple sclerosis: Emerging machine learning techniques and future avenues. NeuroImage Clin. 2022;36:103205. doi: 10.1016/j.nicl.2022.103205 - DOI - PMC - PubMed

Publication types

LinkOut - more resources