Detect and correct bias in multi-site neuroimaging datasets
- PMID: 33152602
- DOI: 10.1016/j.media.2020.101879
Detect and correct bias in multi-site neuroimaging datasets
Abstract
The desire to train complex machine learning algorithms and to increase the statistical power in association studies drives neuroimaging research to use ever-larger datasets. The most obvious way to increase sample size is by pooling scans from independent studies. However, simple pooling is often ill-advised as selection, measurement, and confounding biases may creep in and yield spurious correlations. In this work, we combine 35,320 magnetic resonance images of the brain from 17 studies to examine bias in neuroimaging. In the first experiment, Name That Dataset, we provide empirical evidence for the presence of bias by showing that scans can be correctly assigned to their respective dataset with 71.5% accuracy. Given such evidence, we take a closer look at confounding bias, which is often viewed as the main shortcoming in observational studies. In practice, we neither know all potential confounders nor do we have data on them. Hence, we model confounders as unknown, latent variables. Kolmogorov complexity is then used to decide whether the confounded or the causal model provides the simplest factorization of the graphical model. Finally, we present methods for dataset harmonization and study their ability to remove bias in imaging features. In particular, we propose an extension of the recently introduced ComBat algorithm to control for global variation across image features, inspired by adjusting for unknown population stratification in genetics. Our results demonstrate that harmonization can reduce dataset-specific information in image features. Further, confounding bias can be reduced and even turned into a causal relationship. However, harmonization also requires caution as it can easily remove relevant subject-specific information. Code is available at https://github.com/ai-med/Dataset-Bias.
Keywords: Bias; Big data; Causal inference; Harmonization; MRI.
Copyright © 2020 Elsevier B.V. All rights reserved.
Conflict of interest statement
Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Publication types
MeSH terms
Grants and funding
- U54 MH091657/MH/NIMH NIH HHS/United States
- P20 RR021938/RR/NCRR NIH HHS/United States
- U01 DA041174/DA/NIDA NIH HHS/United States
- U01 DA041156/DA/NIDA NIH HHS/United States
- U01 DA041106/DA/NIDA NIH HHS/United States
- U01 DA041148/DA/NIDA NIH HHS/United States
- CIHR/Canada
- U01 DA041089/DA/NIDA NIH HHS/United States
- U24 RR021382/RR/NCRR NIH HHS/United States
- U01 DA041134/DA/NIDA NIH HHS/United States
- U24 DA041147/DA/NIDA NIH HHS/United States
- R01 AG021910/AG/NIA NIH HHS/United States
- U01 DA041048/DA/NIDA NIH HHS/United States
- U01 DA041093/DA/NIDA NIH HHS/United States
- U01 AG024904/AG/NIA NIH HHS/United States
- U01 DA041022/DA/NIDA NIH HHS/United States
- U01 DA041025/DA/NIDA NIH HHS/United States
- P20 GM103472/GM/NIGMS NIH HHS/United States
- U01 DA041120/DA/NIDA NIH HHS/United States
- U24 DA041123/DA/NIDA NIH HHS/United States
- U01 DA041028/DA/NIDA NIH HHS/United States
- U01 DA041117/DA/NIDA NIH HHS/United States
LinkOut - more resources
Full Text Sources
Other Literature Sources
Medical