Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2025 May 23:arXiv:2412.18389v2.

Agreement of Image Quality Metrics with Radiological Evaluation in the Presence of Motion Artifacts

Affiliations

Agreement of Image Quality Metrics with Radiological Evaluation in the Presence of Motion Artifacts

Elisa Marchetto et al. ArXiv. .

Update in

Abstract

Object: Reliable image quality assessment is crucial for evaluating new motion correction methods for magnetic resonance imaging. We compare the performance of common reference-based and reference-free image quality metrics on unique datasets with real motion artifacts, and analyze the metrics' robustness to typical pre-processing techniques.

Materials and methods: We compared five reference-based and five reference-free metrics on brain data acquired with and without intentional motion (2D and 3D sequences). The metrics were recalculated seven times with varying pre-processing steps. Spearman correlation coefficients were computed to assess the relationship between image quality metrics and radiological evaluation.

Results: All reference-based metrics showed strong correlation with observer assessments. Among reference-free metrics, Average Edge Strength offers the most promising results, as it consistently displayed stronger correlations across all sequences compared to the other reference-free metrics. The strongest correlation was achieved with percentile normalization and restricting the metric values to the skull-stripped brain region. In contrast, correlations were weaker when not applying any brain mask and using min-max or no normalization.

Discussion: Reference-based metrics reliably correlate with radiological evaluation across different sequences and datasets. Pre-processing significantly influences correlation values. Future research should focus on refining pre-processing techniques and exploring approaches for automated image quality evaluation.

Keywords: Artifacts; Data Quality; Magnetic Resonance Imaging; Metrics; Motion.

PubMed Disclaimer

Conflict of interest statement

Conflict of interest The authors declare no potential conflict of interests.

Figures

Fig. 1
Fig. 1
Different pre-processing choices are involved for calculating IQMs. We vary three of the common pre-processing steps, namely masking, normalization and reduction method of the IQM values. The brain mask was either neglected, multiplied to the images or the metric was only evaluated within brain mask voxels. Images were either not normalized or normalized with min-max, mean-std or percentile normalization (except for FSIM, VIF, and LPIPS which require specific image values as shown in 1). IQM values across slices were reduced by calculating the mean value or taking the worst value of all slices (min/max depending on IQM).
Fig. 2
Fig. 2
Overview of the correlation analysis between image quality metrics and observer scores. Each 3D image volume was evaluated by one neuroradiologist for the CUBRIC dataset and by two neuroradiologists and two radiographers for the NRU dataset. For the latter, the scores were averaged with double weight on the more experienced neuroradiologists. IQMs were computed with various preprocessing choices (compare Fig. 1), as illustrated exemplary for SSIM. IQM values and observer scores of all images were then used to calculate the Spearman correlation coefficient to measure the agreement between IQMs and observers.
Fig. 3
Fig. 3
(A) Spearman correlation coefficient ρ between IQMs (x-axis) and observer scores for the four sequences of the NRU dataset (y-axis). Values are provided for statistically significant correlations (p-value < 0.05) and values corresponding to a strong correlation (|ρ|>0.6) are colored in blue and red. The metrics were calculated with the pre-processing settings {Multiply, Percentile, Worst}. (B) Median rank of each IQM, resulting from ranking the absolute values of the correlation coefficients for each sequence and taking the median across sequences.
Fig. 4
Fig. 4
Three examples of MP-RAGE images from one subject. The reference image was acquired without voluntary motion and without motion correction, while the other two examples were acquired with voluntary motion (nodding/shaking) and with/without motion correction. Image quality metrics are reported, alongside with the average observers’ evaluation scores (“Score”). Examples for T2 FLAIR, T1 TIRM and T2 TSE are shown in Fig. S2. The reference-based IQMS (SSIM, PSNR, FSIM, VIF, LPIPS) and reference-free IQMS (AES, TG, NGS, GE, IE) are shown to the right of the images. For the reference image, the reference-based IQMS are calculated on itself and colored in light-gray.
Fig. 5
Fig. 5
Scatter plots visualizing the distribution of metrics values against observer scores. Each blue dot represents one MP-RAGE image volume from the NRU dataset and the corresponding regression line is shown green. For statistically significant correlations (p-value < 0.05), the corresponding Spearman correlation coefficient is provided on top of the plot. The metrics were calculated with the pre-processing settings {Multiply, Percentile, Worst}. Non-integer observer scores result from averaging the scores across the four raters.
Fig. 6
Fig. 6
Overview on the effect of pre-processing implementations in the correlation between IQM and observers’ scores on the MP-RAGEs from the NRU (A) and the CUBRIC dataset (B). We compare the different options for each pro-processing choice individually, while keeping the other two pro-processing settings at the standard {Multiply, Percentile, Worst}. The table only shows statistically significant correlations (p<0.05), leaving the box empty if this requirement is not fulfilled. We indicated with a “*” values for FSIM, VIF and LPIPS which are not available in case of normalization using “Mean-Std” and “None”, as they require a specific range of values (see Table 1). Similarly, these values are unavailable with the “Mask” setting, as the metrics are computed across the entire matrix. Overall, we found that the correlations with reference-based metrics are more consistent compared to the reference-free metrics, which largely display weak correlation with the observer’s evaluations. The pre-processing steps that mostly affect the correlation values are: not applying a brain mask (“No Mask”), applying no normalization (“None”) or rescaling using the “Mean-Std” method.
Fig. 7
Fig. 7
Intensity distributions of one example MP-RAGE image (blue) and its reference (green) for the normalization settings (A) “None”, (B) “Min-max”, (C) “Mean-std” and (D) “Percentile”. The analysis is performed only within the brain mask. Example slices of image and reference (same intensity window) are shown next to the histograms. Min-max normalization is impacted by large outlier values and leads to a mismatch of intensity values in image and reference.

Similar articles

References

    1. Tisdall M.D., Küstner T.: Metrics for motion and mr quality assessment. In: Advances in Magnetic Resonance Technology and Applications vol. 6, pp. 99–116. Elsevier, ??? (2022)
    1. Heckel R., Jacob M., Chaudhari A., Perlman O., Shimron E.: Deep Learning for Accelerated and Robust MRI Reconstruction: a Review (MAGMA. 2024. Jul;37(3):335–368) - PMC - PubMed
    1. Spieker V., Eichhorn H., Hammernik K., Rueckert D., Preibisch C., Karampinos D.C., Schnabel J.A.: Deep learning for retrospective motion correction in mri: A comprehensive review. IEEE Transactions on Medical Imaging 43(2), 846–859 (2024) - PubMed
    1. Breger A., Biguri A., Landman M.S., Selby I., Amberg N., Brunner E., Gröhl J., Hatamikia S., Karner C., Ning L., et al. : A study of why we need to reassess full reference image quality assessment with medical images. arXiv preprint arXiv:2405.19097 (2024) - PubMed
    1. Barrett H.H., Yao J., Rolland J.P., Myers K.J.: Model observers for assessment of image quality. Proceedings of the National Academy of Sciences 90(21), 9758–9765 (1993) - PMC - PubMed

Publication types

LinkOut - more resources