This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

[Preprint]. 2025 May 23:arXiv:2412.18389v2.

Agreement of Image Quality Metrics with Radiological Evaluation in the Presence of Motion Artifacts

Elisa Marchetto^{1

2

3}, Hannah Eichhorn^{4

5}, Daniel Gallichan³, Julia A Schnabel^{4

5

6}, Melanie Ganz^{7

8}

Affiliations

¹ Bernard and Irene Schwartz Center for Biomedical Imaging, Dept. of Radiology, NYU School of Medicine, NY, USA.
² Center for Advanced Imaging Innovation and Research (CAIR), Dept. of Radiology, NYU School of Medicine, NY, USA.
³ CUBRIC, School of Engineering, Cardiff University, Cardiff, UK.
⁴ Institute of Machine Learning in Biomedical Imaging, Helmholtz Munich, Neuherberg, Germany.
⁵ School of Computation, Information and Technology, Technical University of Munich, Munich, Germany.
⁶ School of Biomedical Engineering and Imaging Sciences, King's College London, London, UK.
⁷ Department of Computer Science, University of Copenhagen, Copenhagen, Denmark.
⁸ Neurobiology Research Unit, Copenhagen University Hospital, Copenhagen, Denmark.

PMID: 39764402
PMCID: PMC11703327

Agreement of Image Quality Metrics with Radiological Evaluation in the Presence of Motion Artifacts

Elisa Marchetto et al. ArXiv. 2025.

[Preprint]. 2025 May 23:arXiv:2412.18389v2.

Authors

Elisa Marchetto^{1

2

3}, Hannah Eichhorn^{4

5}, Daniel Gallichan³, Julia A Schnabel^{4

5

6}, Melanie Ganz^{7

8}

Affiliations

¹ Bernard and Irene Schwartz Center for Biomedical Imaging, Dept. of Radiology, NYU School of Medicine, NY, USA.
² Center for Advanced Imaging Innovation and Research (CAIR), Dept. of Radiology, NYU School of Medicine, NY, USA.
³ CUBRIC, School of Engineering, Cardiff University, Cardiff, UK.
⁴ Institute of Machine Learning in Biomedical Imaging, Helmholtz Munich, Neuherberg, Germany.
⁵ School of Computation, Information and Technology, Technical University of Munich, Munich, Germany.
⁶ School of Biomedical Engineering and Imaging Sciences, King's College London, London, UK.
⁷ Department of Computer Science, University of Copenhagen, Copenhagen, Denmark.
⁸ Neurobiology Research Unit, Copenhagen University Hospital, Copenhagen, Denmark.

PMID: 39764402
PMCID: PMC11703327

Update in

Agreement of image quality metrics with radiological evaluation in the presence of motion artifacts.
Marchetto E, Eichhorn H, Gallichan D, Schnabel JA, Ganz M. Marchetto E, et al. MAGMA. 2025 Dec;38(6):991-1002. doi: 10.1007/s10334-025-01266-y. Epub 2025 Jun 10. MAGMA. 2025. PMID: 40493331 Free PMC article.

Abstract

Object: Reliable image quality assessment is crucial for evaluating new motion correction methods for magnetic resonance imaging. We compare the performance of common reference-based and reference-free image quality metrics on unique datasets with real motion artifacts, and analyze the metrics' robustness to typical pre-processing techniques.

Materials and methods: We compared five reference-based and five reference-free metrics on brain data acquired with and without intentional motion (2D and 3D sequences). The metrics were recalculated seven times with varying pre-processing steps. Spearman correlation coefficients were computed to assess the relationship between image quality metrics and radiological evaluation.

Results: All reference-based metrics showed strong correlation with observer assessments. Among reference-free metrics, Average Edge Strength offers the most promising results, as it consistently displayed stronger correlations across all sequences compared to the other reference-free metrics. The strongest correlation was achieved with percentile normalization and restricting the metric values to the skull-stripped brain region. In contrast, correlations were weaker when not applying any brain mask and using min-max or no normalization.

Discussion: Reference-based metrics reliably correlate with radiological evaluation across different sequences and datasets. Pre-processing significantly influences correlation values. Future research should focus on refining pre-processing techniques and exploring approaches for automated image quality evaluation.

Keywords: Artifacts; Data Quality; Magnetic Resonance Imaging; Metrics; Motion.

PubMed Disclaimer

Conflict of interest statement

Conflict of interest The authors declare no potential conflict of interests.

Figures

**Fig. 1**
Different pre-processing choices are involved for calculating IQMs. We vary three of the common pre-processing steps, namely masking, normalization and reduction method of the IQM values. The brain mask was either neglected, multiplied to the images or the metric was only evaluated within brain mask voxels. Images were either not normalized or normalized with min-max, mean-std or percentile normalization (except for FSIM, VIF, and LPIPS which require specific image values as shown in 1). IQM values across slices were reduced by calculating the mean value or taking the worst value of all slices (min/max depending on IQM).

**Fig. 2**
Overview of the correlation analysis between image quality metrics and observer scores. Each 3D image volume was evaluated by one neuroradiologist for the CUBRIC dataset and by two neuroradiologists and two radiographers for the NRU dataset. For the latter, the scores were averaged with double weight on the more experienced neuroradiologists. IQMs were computed with various preprocessing choices (compare Fig. 1), as illustrated exemplary for SSIM. IQM values and observer scores of all images were then used to calculate the Spearman correlation coefficient to measure the agreement between IQMs and observers.

**Fig. 3**
(A) Spearman correlation coefficient $ρ$ between IQMs (x-axis) and observer scores for the four sequences of the NRU dataset (y-axis). Values are provided for statistically significant correlations (p-value < 0.05) and values corresponding to a strong correlation $(| ρ | > 0.6)$ are colored in blue and red. The metrics were calculated with the pre-processing settings {*Multiply, Percentile, Worst*}. (B) Median rank of each IQM, resulting from ranking the absolute values of the correlation coefficients for each sequence and taking the median across sequences.

**Fig. 4**
Three examples of MP-RAGE images from one subject. The reference image was acquired without voluntary motion and without motion correction, while the other two examples were acquired with voluntary motion (nodding/shaking) and with/without motion correction. Image quality metrics are reported, alongside with the average observers’ evaluation scores (“Score”). Examples for $T_{2}$ FLAIR, $T_{1}$ TIRM and $T_{2}$ TSE are shown in Fig. S2. The reference-based IQMS (SSIM, PSNR, FSIM, VIF, LPIPS) and reference-free IQMS (AES, TG, NGS, GE, IE) are shown to the right of the images. For the reference image, the reference-based IQMS are calculated on itself and colored in light-gray.

**Fig. 5**
Scatter plots visualizing the distribution of metrics values against observer scores. Each blue dot represents one MP-RAGE image volume from the NRU dataset and the corresponding regression line is shown green. For statistically significant correlations (p-value < 0.05), the corresponding Spearman correlation coefficient is provided on top of the plot. The metrics were calculated with the pre-processing settings {*Multiply, Percentile, Worst*}. Non-integer observer scores result from averaging the scores across the four raters.

**Fig. 6**
Overview on the effect of pre-processing implementations in the correlation between IQM and observers’ scores on the MP-RAGEs from the NRU (A) and the CUBRIC dataset (B). We compare the different options for each pro-processing choice individually, while keeping the other two pro-processing settings at the standard {*Multiply, Percentile, Worst*}. The table only shows statistically significant correlations $(p < 0.05)$ , leaving the box empty if this requirement is not fulfilled. We indicated with a “*” values for FSIM, VIF and LPIPS which are not available in case of normalization using “Mean-Std” and “None”, as they require a specific range of values (see Table 1). Similarly, these values are unavailable with the “Mask” setting, as the metrics are computed across the entire matrix. Overall, we found that the correlations with reference-based metrics are more consistent compared to the reference-free metrics, which largely display weak correlation with the observer’s evaluations. The pre-processing steps that mostly affect the correlation values are: not applying a brain mask (“No Mask”), applying no normalization (“None”) or rescaling using the “Mean-Std” method.

**Fig. 7**
Intensity distributions of one example MP-RAGE image (blue) and its reference (green) for the normalization settings (A) “None”, (B) “Min-max”, (C) “Mean-std” and (D) “Percentile”. The analysis is performed only within the brain mask. Example slices of image and reference (same intensity window) are shown next to the histograms. Min-max normalization is impacted by large outlier values and leads to a mismatch of intensity values in image and reference.

See this image and copyright information in PMC

References

1. Tisdall M.D., Küstner T.: Metrics for motion and mr quality assessment. In: Advances in Magnetic Resonance Technology and Applications vol. 6, pp. 99–116. Elsevier, ??? (2022)
1. Heckel R., Jacob M., Chaudhari A., Perlman O., Shimron E.: Deep Learning for Accelerated and Robust MRI Reconstruction: a Review (MAGMA. 2024. Jul;37(3):335–368) - PMC - PubMed
1. Spieker V., Eichhorn H., Hammernik K., Rueckert D., Preibisch C., Karampinos D.C., Schnabel J.A.: Deep learning for retrospective motion correction in mri: A comprehensive review. IEEE Transactions on Medical Imaging 43(2), 846–859 (2024) - PubMed
1. Breger A., Biguri A., Landman M.S., Selby I., Amberg N., Brunner E., Gröhl J., Hatamikia S., Karner C., Ning L., et al. : A study of why we need to reassess full reference image quality assessment with medical images. arXiv preprint arXiv:2405.19097 (2024) - PMC - PubMed
1. Barrett H.H., Yao J., Rolland J.P., Myers K.J.: Model observers for assessment of image quality. Proceedings of the National Academy of Sciences 90(21), 9758–9765 (1993) - PMC - PubMed

Publication types

Actions

Grants and funding

P41 EB017183/EB/NIBIB NIH HHS/United States

LinkOut - more resources

Full Text Sources
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

This is a preprint.

Agreement of Image Quality Metrics with Radiological Evaluation in the Presence of Motion Artifacts

Affiliations

Agreement of Image Quality Metrics with Radiological Evaluation in the Presence of Motion Artifacts

Authors

Affiliations

Update in

Abstract

Conflict of interest statement

Figures

References

Publication types

Grants and funding

LinkOut - more resources

Full Text Sources