Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Nov 25;10(1):32.
doi: 10.1186/s40708-023-00210-x.

Effect of data harmonization of multicentric dataset in ASD/TD classification

Affiliations

Effect of data harmonization of multicentric dataset in ASD/TD classification

Giacomo Serra et al. Brain Inform. .

Abstract

Machine Learning (ML) is nowadays an essential tool in the analysis of Magnetic Resonance Imaging (MRI) data, in particular in the identification of brain correlates in neurological and neurodevelopmental disorders. ML requires datasets of appropriate size for training, which in neuroimaging are typically obtained collecting data from multiple acquisition centers. However, analyzing large multicentric datasets can introduce bias due to differences between acquisition centers. ComBat harmonization is commonly used to address batch effects, but it can lead to data leakage when the entire dataset is used to estimate model parameters. In this study, structural and functional MRI data from the Autism Brain Imaging Data Exchange (ABIDE) collection were used to classify subjects with Autism Spectrum Disorders (ASD) compared to Typical Developing controls (TD). We compared the classical approach (external harmonization) in which harmonization is performed before train/test split, with an harmonization calculated only on the train set (internal harmonization), and with the dataset with no harmonization. The results showed that harmonization using the whole dataset achieved higher discrimination performance, while non-harmonized data and harmonization using only the train set showed similar results, for both structural and connectivity features. We also showed that the higher performances of the external harmonization are not due to larger size of the sample for the estimation of the model and hence these improved performance with the entire dataset may be ascribed to data leakage. In order to prevent this leakage, it is recommended to define the harmonization model solely using the train set.

Keywords: ABIDE; Autism spectrum disorder; Harmonization; Machine learning; Multi-site data.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Effect of external (center) and internal (right) harmonization approaches on a structural features, left hemisphere cortical thickness, compared with the non-harmonized (left) scenario. The boxplots display the distributions of the features, grouped by site, which are sorted by increasing median age
Fig. 2
Fig. 2
Effect of external (center) and internal (right) harmonization approaches on a connectivity features, 1385, compared with the not harmonized (left) scenario. The boxplots display the distributions of the features, grouped by site, which are sorted by increasing median age
Fig. 3
Fig. 3
The ASD/TD classification results are reported, for different feature sets and harmonization strategies
Fig. 4
Fig. 4
Structural (AB) and connectivity (CD) features of minSC dataset. Comparison between the no-harmonization and the internal-harmonization criteria (AC) and between the external-harmonization and the internal-harmonization criteria (BD). We show the 30% of the most important features for the internal-harmonization method in order to see what are the feature importance values of the same features for the no-harmonization method

Similar articles

References

    1. Guze Samuel B. Diagnostic and statistical manual of mental disorders, 4th ed. (DSM-IV) Am J Psychiatry. 1995;152(8):1228–1228. doi: 10.1176/ajp.152.8.122. - DOI
    1. World Health Organization: The ICD-10 classification of mental and behavioural disorders : diagnostic criteria for research. World Health Organization (1993)
    1. Postema M, Van Rooij D, Anagnostou E, Arango C, Auzias G, Behrmann M, Busatto G, Calderoni S, Calvo R, Daly E, Deruelle C, Di Martino A, Dinstein I, Duran F, Durston S, Ecker C, Ehrlich S, Fair D, Fedor J, Francks C. Altered structural brain asymmetry in autism spectrum disorder in a study of 54 datasets. Nat Commun. 2019 doi: 10.1038/s41467-019-13005-8. - DOI - PMC - PubMed
    1. Riddle K, Cascio C, Woodward N. Brain structure in autism: a voxel-based morphometry analysis of the autism brain imaging database exchange (abide) Brain Imaging Behav. 2017 doi: 10.1007/s11682-016-9534-5. - DOI - PMC - PubMed
    1. Supekar K, Uddin LQ, Khouzam A, Phillips J, Gaillard WD, Kenworthy LE, Yerys BE, Vaidya CJ, Menon V. Brain hyperconnectivity in children with autism and its links to social deficits. Cell Rep. 2013;5(3):738–747. doi: 10.1016/j.celrep.2013.10.001. - DOI - PMC - PubMed

LinkOut - more resources