Efficacy of MRI data harmonization in the age of machine learning: a multicenter study across 36 datasets

doi:10.1038/s41597-023-02421-7

. 2024 Jan 23;11(1):115.

doi: 10.1038/s41597-023-02421-7.

Efficacy of MRI data harmonization in the age of machine learning: a multicenter study across 36 datasets

Chiara Marzi^{1

2}, Marco Giannelli³, Andrea Barucci², Carlo Tessa⁴, Mario Mascalchi^{5

6}, Stefano Diciotti^{7

8}

Affiliations

¹ Department of Statistics, Computer Science and Applications "Giuseppe Parenti", University of Florence, 50134, Florence, Italy.
² "Nello Carrara" Institute of Applied Physics (IFAC), National Research Council (CNR), 50019, Sesto Fiorentino, Florence, Italy.
³ Unit of Medical Physics, Pisa University Hospital "Azienda Ospedaliero-Universitaria Pisana", 56126, Pisa, Italy.
⁴ Radiology Unit Apuane e Lunigiana, Azienda USL Toscana Nord Ovest, 54100, Massa, Italy.
⁵ Department of Experimental and Clinical Biomedical Sciences "Mario Serio", University of Florence, 50139, Florence, Italy.
⁶ Division of Epidemiology and Clinical Governance, Institute for Study, Prevention and netwoRk in Oncology (ISPRO), 50139, Florence, Italy.
⁷ Department of Electrical, Electronic, and Information Engineering "Guglielmo Marconi" - DEI, University of Bologna, 47522, Cesena, Italy. stefano.diciotti@unibo.it.
⁸ Alma Mater Research Institute for Human-Centered Artificial Intelligence, University of Bologna, 40121, Bologna, Italy. stefano.diciotti@unibo.it.

PMID: 38263181
PMCID: PMC10805868
DOI: 10.1038/s41597-023-02421-7

Efficacy of MRI data harmonization in the age of machine learning: a multicenter study across 36 datasets

Chiara Marzi et al. Sci Data. 2024.

. 2024 Jan 23;11(1):115.

doi: 10.1038/s41597-023-02421-7.

Authors

Chiara Marzi^{1

2}, Marco Giannelli³, Andrea Barucci², Carlo Tessa⁴, Mario Mascalchi^{5

6}, Stefano Diciotti^{7

8}

Affiliations

¹ Department of Statistics, Computer Science and Applications "Giuseppe Parenti", University of Florence, 50134, Florence, Italy.
² "Nello Carrara" Institute of Applied Physics (IFAC), National Research Council (CNR), 50019, Sesto Fiorentino, Florence, Italy.
³ Unit of Medical Physics, Pisa University Hospital "Azienda Ospedaliero-Universitaria Pisana", 56126, Pisa, Italy.
⁴ Radiology Unit Apuane e Lunigiana, Azienda USL Toscana Nord Ovest, 54100, Massa, Italy.
⁵ Department of Experimental and Clinical Biomedical Sciences "Mario Serio", University of Florence, 50139, Florence, Italy.
⁶ Division of Epidemiology and Clinical Governance, Institute for Study, Prevention and netwoRk in Oncology (ISPRO), 50139, Florence, Italy.
⁷ Department of Electrical, Electronic, and Information Engineering "Guglielmo Marconi" - DEI, University of Bologna, 47522, Cesena, Italy. stefano.diciotti@unibo.it.
⁸ Alma Mater Research Institute for Human-Centered Artificial Intelligence, University of Bologna, 40121, Bologna, Italy. stefano.diciotti@unibo.it.

PMID: 38263181
PMCID: PMC10805868
DOI: 10.1038/s41597-023-02421-7

Abstract

Pooling publicly-available MRI data from multiple sites allows to assemble extensive groups of subjects, increase statistical power, and promote data reuse with machine learning techniques. The harmonization of multicenter data is necessary to reduce the confounding effect associated with non-biological sources of variability in the data. However, when applied to the entire dataset before machine learning, the harmonization leads to data leakage, because information outside the training set may affect model building, and potentially falsely overestimate performance. We propose a 1) measurement of the efficacy of data harmonization; 2) harmonizer transformer, i.e., an implementation of the ComBat harmonization allowing its encapsulation among the preprocessing steps of a machine learning pipeline, avoiding data leakage by design. We tested these tools using brain T₁-weighted MRI data from 1740 healthy subjects acquired at 36 sites. After harmonization, the site effect was removed or reduced, and we showed the data leakage effect in predicting individual age from MRI data, highlighting that introducing the harmonizer transformer into a machine learning pipeline allows for avoiding data leakage by design.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

**Fig. 1**
Age distributions. Age distributions of participants for CHILDHOOD, ADOLESCENCE, ADULTHOOD, and LIFESPAN meta-datasets, grouped by single-center dataset and sorted by median age.

**Fig. 2**
3D box-counting for computation of the FD. An example of the 3D box-counting algorithm that uses an automated selection of the fractal scaling window through the *fractalbrain* toolkit. *N(s)* is the average number of 3D cubes of side s needed to fully enclose the brain structure computed using 20 uniformly distributed random offsets to the grid origin. The regression line within the optimal fractal scaling window, whose slope (sign changed) is the FD, is depicted in red.

**Fig. 3**
Machine learning pipeline. A pipeline represents the entire data workflow, combining all transformation steps and machine learning model training. It is essential to automate an end-to-end training/test process without any form of data leakage and improve reproducibility, ease of deployment, and code reuse, especially when complex validation schemes are needed.

**Fig. 4**
Overview of the analysis of simulated data for each experiment. After an external hold-out, we computed the performance of a site prediction classifier trained using (a) the *harmonizer* transformer within the machine learning pipeline (internal not leaked test set) and (b) harmonizing all data with *neuroHarmonize* before imaging site/age prediction (internal leaked test set). Secondly, we compared these performances with that observed on an external test set never used for harmonization and training.

**Fig. 5**
Imaging site prediction results with CT and FD simulated data. We reported the difference between the average balanced accuracy obtained in the external test set and that gained in the internal test sets (dotted line for leaked internal test set and solid line for not leaked internal test set) and Cohen’s d effect size vs. the number of participants per single-center dataset n. The cross marker indicates a significant difference between balanced accuracy distributions (one-tailed paired t-test Bonferroni adjusted p-value < 10⁻⁹ and < 10⁻¹⁰ for CT and FD, respectively). The colors and line types in Cohen’s d plots are consistent with those employed in the other plots.

**Fig. 6**
Age prediction results with CT and FD simulated data. We reported the difference between the average MAE obtained in the external test set and that gained in the internal test sets (dotted line for leaked internal test set and solid line for not leaked internal test set) and Cohen’s d effect size vs. the number of participants per single-center dataset n. The cross marker indicates a significant difference between balanced accuracy distributions (see Tables 5, 6 for details). The colors and line types in Cohen’s d plots are consistent with those employed in the other plots.

**Fig. 7**
Boxplot of the average CT of the cerebral cortex. The boxplots of the average CT of the cerebral cortex without harmonization are shown for the CHILDHOOD, ADOLESCENCE, ADULTHOOD, and LIFESPAN meta-datasets.

**Fig. 8**
Boxplot of the average FD of the cerebral cortex. The boxplots of the FD of the cerebral cortex without harmonization are shown for the CHILDHOOD, ADOLESCENCE, ADULTHOOD, and LIFESPAN meta-datasets.

**Fig. 9**
Confusion matrices of site prediction using CT features. Each confusion matrix was normalized for the number of subjects belonging to each site. In this way, the sum of the matrix cells of each row gives 1. The confusion matrix obtained using the *harmonizer* within the machine learning pipeline seems similar to that obtained by harmonizing all the data with *neuroHarmonize* before imaging site prediction, even though the model is built on training data only and then applied to test data.

**Fig. 10**
Confusion matrices of site prediction using FD features. Each confusion matrix was normalized for the number of subjects belonging to each site. In this way, the sum of the matrix cells of each row gives 1. The confusion matrix obtained using the *harmonizer* within the machine learning pipeline seems similar to that obtained by harmonizing all the data with *neuroHarmonize* before imaging site prediction, even though the model is built on training data only and then applied to test data.

**Fig. 11**
Confusion matrices of site prediction using CT and FD features in the LIFESPAN meta-dataset. Each confusion matrix was normalized for the number of subjects belonging to each site. In this way, the sum of the matrix cells of each row gives 1. The confusion matrix obtained using the *harmonizer* within the machine learning pipeline seems similar to that obtained by harmonizing all the data with *neuroHarmonize* before imaging site prediction, even though the model is built on training data only and then applied to test data.

**Fig. 12**
Scatterplot of the average CT of the cerebral cortex vs. age. The plot of the average CT of the cerebral cortex vs. age is shown for the CHILDHOOD, ADOLESCENCE, ADULTHOOD, and LIFESPAN meta-datasets without and with harmonization using the *harmonizer* transformer. In the latter case, we considered only the first CV among the 100 repetitions. Specifically, for each subject, we plotted the harmonized value obtained in the fold when the subject was included in the test set.

**Fig. 13**
Scatterplot of the FD of the cerebral cortex vs. age. The plot of the FD of the cerebral cortex vs. age is shown for the CHILDHOOD, ADOLESCENCE, ADULTHOOD, and LIFESPAN meta-datasets without and with harmonization using the *harmonizer* transformer. In the latter case, we considered only the first CV among the 100 repetitions. Specifically, for each subject, we plotted the harmonized value obtained in the fold when the subject was included in the test set.

See this image and copyright information in PMC

Cited by

Deep Learning for MRI Segmentation and Molecular Subtyping in Glioblastoma: Critical Aspects from an Emerging Field.
Bonada M, Rossi LF, Carone G, Panico F, Cofano F, Fiaschi P, Garbossa D, Di Meco F, Bianconi A. Bonada M, et al. Biomedicines. 2024 Aug 16;12(8):1878. doi: 10.3390/biomedicines12081878. Biomedicines. 2024. PMID: 39200342 Free PMC article. Review.
A critical assessment of artificial intelligence in magnetic resonance imaging of cancer.
Wu C, Andaloussi MA, Hormuth DA 2nd, Lima EABF, Lorenzo G, Stowers CE, Ravula S, Levac B, Dimakis AG, Tamir JI, Brock KK, Chung C, Yankeelov TE. Wu C, et al. Npj Imaging. 2025;3(1):15. doi: 10.1038/s44303-025-00076-0. Epub 2025 Apr 9. Npj Imaging. 2025. PMID: 40226507 Free PMC article. Review.
Editorial: Methods and application in fractal analysis of neuroimaging data.
Porcaro C, Diciotti S, Madan CR, Marzi C. Porcaro C, et al. Front Hum Neurosci. 2024 Jul 10;18:1453284. doi: 10.3389/fnhum.2024.1453284. eCollection 2024. Front Hum Neurosci. 2024. PMID: 39050380 Free PMC article. No abstract available.
Lifespan reference curves for harmonizing multi-site regional brain white matter metrics from diffusion MRI.
Zhu AH, Nir TM, Javid S, Villalon-Reina JE, Rodrigue AL, Strike LT, de Zubicaray GI, McMahon KL, Wright MJ, Medland SE, Blangero J, Glahn DC, Kochunov P, Håberg AK, Thompson PM, Jahanshad N; Alzheimer’s Disease Neuroimaging Initiative. Zhu AH, et al. bioRxiv [Preprint]. 2024 Mar 1:2024.02.22.581646. doi: 10.1101/2024.02.22.581646. bioRxiv. 2024. Update in: Sci Data. 2025 May 6;12(1):748. doi: 10.1038/s41597-025-05028-2. PMID: 38463962 Free PMC article. Updated. Preprint.
Superpixel-ComBat modeling: A joint approach for harmonization and characterization of inter-scanner variability in T1-weighted images.
Chen CL, Torbati ME, Minhas DS, Laymon CM, Hwang SJ, Bilgel M, Crainiceanu A, Jin H, Luo W, Maillard P, Fletcher E, Crainiceanu CM, DeCarli CS, Aizenstein HJ, Tudorascu DL. Chen CL, et al. Imaging Neurosci (Camb). 2024 Oct 3;2:imag-2-00306. doi: 10.1162/imag_a_00306. eCollection 2024. Imaging Neurosci (Camb). 2024. PMID: 40800451 Free PMC article.

See all "Cited by" articles

References

1. Alfaro-Almagro F, et al. Image processing and Quality Control for the first 10,000 brain imaging datasets from UK Biobank. NeuroImage. 2018;166:400–424. doi: 10.1016/j.neuroimage.2017.10.034. - DOI - PMC - PubMed
1. Pomponio R, et al. Harmonization of large MRI datasets for the analysis of brain imaging patterns throughout the lifespan. NeuroImage. 2020;208:116450. doi: 10.1016/j.neuroimage.2019.116450. - DOI - PMC - PubMed
1. Radua J, et al. Increased power by harmonizing structural MRI site differences with the ComBat batch adjustment method in ENIGMA. NeuroImage. 2020;218:116956. doi: 10.1016/j.neuroimage.2020.116956. - DOI - PMC - PubMed
1. Thompson PM, et al. The ENIGMA Consortium: large-scale collaborative analyses of neuroimaging and genetic data. Brain Imaging Behav. 2014;8:153–182. doi: 10.1007/s11682-013-9269-5. - DOI - PMC - PubMed
1. Fortin JP, et al. Harmonization of cortical thickness measurements across scanners and sites. NeuroImage. 2018;167:104–120. doi: 10.1016/j.neuroimage.2017.11.024. - DOI - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information

[1] Alfaro-Almagro F, et al. Image processing and Quality Control for the first 10,000 brain imaging datasets from UK Biobank. NeuroImage. 2018;166:400–424. doi: 10.1016/j.neuroimage.2017.10.034. - DOI - PMC - PubMed

[2] Alfaro-Almagro F, et al. Image processing and Quality Control for the first 10,000 brain imaging datasets from UK Biobank. NeuroImage. 2018;166:400–424. doi: 10.1016/j.neuroimage.2017.10.034. - DOI - PMC - PubMed

[3] Pomponio R, et al. Harmonization of large MRI datasets for the analysis of brain imaging patterns throughout the lifespan. NeuroImage. 2020;208:116450. doi: 10.1016/j.neuroimage.2019.116450. - DOI - PMC - PubMed

[4] Pomponio R, et al. Harmonization of large MRI datasets for the analysis of brain imaging patterns throughout the lifespan. NeuroImage. 2020;208:116450. doi: 10.1016/j.neuroimage.2019.116450. - DOI - PMC - PubMed

[5] Radua J, et al. Increased power by harmonizing structural MRI site differences with the ComBat batch adjustment method in ENIGMA. NeuroImage. 2020;218:116956. doi: 10.1016/j.neuroimage.2020.116956. - DOI - PMC - PubMed

[6] Radua J, et al. Increased power by harmonizing structural MRI site differences with the ComBat batch adjustment method in ENIGMA. NeuroImage. 2020;218:116956. doi: 10.1016/j.neuroimage.2020.116956. - DOI - PMC - PubMed

[7] Thompson PM, et al. The ENIGMA Consortium: large-scale collaborative analyses of neuroimaging and genetic data. Brain Imaging Behav. 2014;8:153–182. doi: 10.1007/s11682-013-9269-5. - DOI - PMC - PubMed

[8] Thompson PM, et al. The ENIGMA Consortium: large-scale collaborative analyses of neuroimaging and genetic data. Brain Imaging Behav. 2014;8:153–182. doi: 10.1007/s11682-013-9269-5. - DOI - PMC - PubMed

[9] Fortin JP, et al. Harmonization of cortical thickness measurements across scanners and sites. NeuroImage. 2018;167:104–120. doi: 10.1016/j.neuroimage.2017.11.024. - DOI - PMC - PubMed

[10] Fortin JP, et al. Harmonization of cortical thickness measurements across scanners and sites. NeuroImage. 2018;167:104–120. doi: 10.1016/j.neuroimage.2017.11.024. - DOI - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Efficacy of MRI data harmonization in the age of machine learning: a multicenter study across 36 datasets

Affiliations

Efficacy of MRI data harmonization in the age of machine learning: a multicenter study across 36 datasets

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

MeSH terms

LinkOut - more resources

Full Text Sources

Medical

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

MeSH terms

Related information

LinkOut - more resources

Full Text Sources

Medical