. 2024 Jan 23;43(1):113597.

doi: 10.1016/j.celrep.2023.113597. Epub 2023 Dec 29.

Performance reserves in brain-imaging-based phenotype prediction

Marc-Andre Schulz¹, Danilo Bzdok², Stefan Haufe³, John-Dylan Haynes⁴, Kerstin Ritter⁵

Affiliations

¹ Charité - Universitätsmedizin Berlin (corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health), Department of Psychiatry and Psychotherapy, Berlin, Germany; Bernstein Center for Computational Neuroscience, Berlin, Germany. Electronic address: marc-andre.schulz@charite.de.
² McConnell Brain Imaging Centre (BIC), Montreal Neurological Institute (MNI), Faculty of Medicine, McGill University, Montreal, QC, Canada; Department of Biomedical Engineering, Faculty of Medicine, McGill University, Montreal, QC, Canada; Mila - Quebec Artificial Intelligence Institute, Montreal, QC, Canada.
³ Bernstein Center for Computational Neuroscience, Berlin, Germany; Technische Universität Berlin, Berlin, Germany; Physikalisch-Technische Bundesanstalt, Berlin, Germany; Charité - Universitätsmedizin Berlin (corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health), Department of Neurology, Berlin Center for Advanced Neuroimaging, Berlin, Germany.
⁴ Bernstein Center for Computational Neuroscience, Berlin, Germany; Charité - Universitätsmedizin Berlin (corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health), Department of Neurology, Berlin Center for Advanced Neuroimaging, Berlin, Germany.
⁵ Charité - Universitätsmedizin Berlin (corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health), Department of Psychiatry and Psychotherapy, Berlin, Germany; Bernstein Center for Computational Neuroscience, Berlin, Germany.

PMID: 38159275
PMCID: PMC11215805
DOI: 10.1016/j.celrep.2023.113597

Performance reserves in brain-imaging-based phenotype prediction

Marc-Andre Schulz et al. Cell Rep. 2024.

. 2024 Jan 23;43(1):113597.

doi: 10.1016/j.celrep.2023.113597. Epub 2023 Dec 29.

Authors

Marc-Andre Schulz¹, Danilo Bzdok², Stefan Haufe³, John-Dylan Haynes⁴, Kerstin Ritter⁵

Affiliations

¹ Charité - Universitätsmedizin Berlin (corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health), Department of Psychiatry and Psychotherapy, Berlin, Germany; Bernstein Center for Computational Neuroscience, Berlin, Germany. Electronic address: marc-andre.schulz@charite.de.
² McConnell Brain Imaging Centre (BIC), Montreal Neurological Institute (MNI), Faculty of Medicine, McGill University, Montreal, QC, Canada; Department of Biomedical Engineering, Faculty of Medicine, McGill University, Montreal, QC, Canada; Mila - Quebec Artificial Intelligence Institute, Montreal, QC, Canada.
³ Bernstein Center for Computational Neuroscience, Berlin, Germany; Technische Universität Berlin, Berlin, Germany; Physikalisch-Technische Bundesanstalt, Berlin, Germany; Charité - Universitätsmedizin Berlin (corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health), Department of Neurology, Berlin Center for Advanced Neuroimaging, Berlin, Germany.
⁴ Bernstein Center for Computational Neuroscience, Berlin, Germany; Charité - Universitätsmedizin Berlin (corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health), Department of Neurology, Berlin Center for Advanced Neuroimaging, Berlin, Germany.
⁵ Charité - Universitätsmedizin Berlin (corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health), Department of Psychiatry and Psychotherapy, Berlin, Germany; Bernstein Center for Computational Neuroscience, Berlin, Germany.

PMID: 38159275
PMCID: PMC11215805
DOI: 10.1016/j.celrep.2023.113597

Abstract

This study examines the impact of sample size on predicting cognitive and mental health phenotypes from brain imaging via machine learning. Our analysis shows a 3- to 9-fold improvement in prediction performance when sample size increases from 1,000 to 1 M participants. However, despite this increase, the data suggest that prediction accuracy remains worryingly low and far from fully exploiting the predictive potential of brain imaging data. Additionally, we find that integrating multiple imaging modalities boosts prediction accuracy, often equivalent to doubling the sample size. Interestingly, the most informative imaging modality often varied with increasing sample size, emphasizing the need to consider multiple modalities. Despite significant performance reserves for phenotype prediction, achieving substantial improvements may necessitate prohibitively large sample sizes, thus casting doubt on the practical or clinical utility of machine learning in some areas of neuroimaging.

Keywords: CP: Neuroscience; accuracy limits; brain imaging; machine learning; multimodal imaging; sample size; scaling behavior.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests The authors declare no competing interests.

Figures

**Figure 1.. Learning curves for neuroimaging-based phenotype prediction precisely follow a power-law function**
(A) Prediction accuracy scales with the number of training samples. The precise nature of this relationship can be described by a simple power law [α n^−β + γ]. (A.1) For instance, when predicting fluid intelligence from rfMRI data using ridge regression, out-of-sample accuracy (blue) closely followed the fitted power law (red). (A.2) We observed stable and continuous improvements in accuracy with increasing sample size, i.e., approximately linear scaling of prediction accuracy with log(n). (A.3 and A.4) Residuals of the power-law fit gave no indication of systematic deviations between measured accuracy and fitted power law. (B) Power-law scaling was observed in all evaluated prediction tasks (i.e., combinations of imaging modality and target phenotype), with a goodness-of-fit R² between measured learning curve and power law of on average 0.990 (SD = 0.015, min = 0.902). (C) Learning curve extrapolation predicted accuracy achievable on unseen larger samples. Shown are projected gains in prediction accuracy derived from learning curve extrapolation on the y axis in relation to observed gains in prediction accuracy on the x axis. Both were derived by doubling the training sample size from 8,000 to 16,000. Error bars indicate standard error of the mean (SEM).

**Figure 2.. Linear models are operating far below ceiling accuracy for most target phenotype predictions**
Learning curves show the collective results obtained from regularized linear models using T1, DWI, and rfMRI data to predict sociodemographic, cognitive function, behavior/lifestyle, and mental health phenotypes. Training datasets were subsampled from the UK Biobank up to a size of 32,000 participants. Learning curves were extrapolated beyond 32,000 participants. To indicate extrapolation uncertainty, each colored line represents a power-law fit based on a bootstrap sample of observed accuracies. Observed prediction accuracies are marked black; majority classifier/median regression baselines are marked dashed gray. Blue vertical lines indicate the sample size of the Human Connectome Project (1,000), the imaging sample size goal of the UK Biobank (100,000), and the proposed Million Brain Initiative (1 M). Error bars indicate SEM.

**Figure 3.. Multifold gains in prediction performance are projected for behavioral and mental health phenotypes when moving from 1,000 to 1 M samples**
Shown is the relative increase in prediction accuracy per modality and target phenotype derived from learning curve extrapolation on regularized linear models. Results for physical activity could not be reliably estimated due to near-zero baseline (cf. Figure 2). Error bars indicate SEM.

**Figure 4.. Augmenting single-modality feature spaces to incorporate multimodal input data can lead to improvements in prediction accuracy on par with doubling the sample size**
The 512 leading principal components of single-modality data, or of concatenated dual-modality data, were used as the basis for phenotype prediction. Pictured is the min-max scaled prediction accuracy, with accuracy at 1,000 training samples representing the origin of the respective graph. Switching from single modalities to multimodal input data led to improvements in prediction accuracy for all target phenotypes. For 10 out of 16 target phenotypes, improvements from multimodality were comparable to improvements from doubling the sample size from 8,000 to 16,000. Different brain imaging modalities appear to provide complementary, nonredundant predictive information for most target phenotypes (see Figure S7 for an alternative visualization).

**Figure 5.. Linear models performed on par with nonlinear machine learning models in neuroimaging-based phenotype prediction**
We found no consistent evidence of exploitable predictive nonlinear structure in neuroimaging data. Only for DWI-based prediction of sex and age at large (>16,000) training sample sizes did nonlinear models marginally outperform their linear counterparts. Pictured are results for linear and RBF-kernelized nonlinear ridge regression. For other nonlinear machine learning models, see the supplemental information. Error bars indicate SEM.

See this image and copyright information in PMC

Cited by

Advances and challenges in neuroimaging-based pain biomarkers.
Zhang LB, Chen YX, Li ZJ, Geng XY, Zhao XY, Zhang FR, Bi YZ, Lu XJ, Hu L. Zhang LB, et al. Cell Rep Med. 2024 Oct 15;5(10):101784. doi: 10.1016/j.xcrm.2024.101784. Epub 2024 Oct 8. Cell Rep Med. 2024. PMID: 39383872 Free PMC article. Review.
Group-to-individual generalizability and individual-level inferences in cognitive neuroscience.
Mattoni M, Fisher AJ, Gates KM, Chein J, Olino TM. Mattoni M, et al. Neurosci Biobehav Rev. 2025 Feb;169:106024. doi: 10.1016/j.neubiorev.2025.106024. Epub 2025 Jan 30. Neurosci Biobehav Rev. 2025. PMID: 39889869 Review.
Prediction of cognitive performance differences in older age from multimodal neuroimaging data.
Krämer C, Stumme J, da Costa Campos L, Dellani P, Rubbert C, Caspers J, Caspers S, Jockwitz C. Krämer C, et al. Geroscience. 2024 Feb;46(1):283-308. doi: 10.1007/s11357-023-00831-4. Epub 2023 Jun 13. Geroscience. 2024. PMID: 37308769 Free PMC article.
Do Transformers and CNNs Learn Different Concepts of Brain Age?
Siegel NT, Kainmueller D, Deniz F, Ritter K, Schulz MA. Siegel NT, et al. Hum Brain Mapp. 2025 Jun 1;46(8):e70243. doi: 10.1002/hbm.70243. Hum Brain Mapp. 2025. PMID: 40489428 Free PMC article.
Voxel-Wise or Region-Wise Nuisance Regression for Functional Connectivity Analyses: Does It Matter?
Muganga T, Sasse L, Larabi DI, Nieto N, Caspers J, Eickhoff SB, Patil KR. Muganga T, et al. Hum Brain Mapp. 2025 Aug 15;46(12):e70323. doi: 10.1002/hbm.70323. Hum Brain Mapp. 2025. PMID: 40838474 Free PMC article.

See all "Cited by" articles

References

1. Jack CR, Shiung MM, Gunter JL, O’brien PC, Weigand SD, Knopman DS, Boeve BF, Ivnik RJ, Smith GE, and Cha RH (2004). Comparison of different MRI brain atrophy rate measures with clinical disease progression in AD. Neurology 62, 591–600. - PMC - PubMed
1. Plant C, Teipel SJ, Oswald A, Böhm C, Meindl T, Mourao-Miranda J, Bokde AW, Hampel H, and Ewers M. (2010). Automated detection of brain atrophy patterns based on MRI for the prediction of Alzheimer’s disease. Neuroimage 50, 162–174. - PMC - PubMed
1. Rocca MA, Battaglini M, Benedict RHB, De Stefano N, Geurts JJG, Henry RG, Horsfield MA, Jenkinson M, Pagani E, and Filippi M. (2017). Brain MRI atrophy quantification in MS: from methods to clinical application. Neurology 88, 403–413. - PMC - PubMed
1. Kamnitsas K, Ledig C, Newcombe VFJ, Simpson JP, Kane AD, Menon DK, Rueckert D, and Glocker B. (2017). Efficient multi-scale 3D CNN with fully connected CRF for accurate brain lesion segmentation. Med. Image Anal. 36, 61–78. - PubMed
1. Akkus Z, Galimzianova A, Hoogi A, Rubin DL, and Erickson BJ (2017). Deep learning for brain MRI segmentation: state of the art and future directions. J. Digit. Imaging 30, 449–459. - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

R01 AG068563/AG/NIA NIH HHS/United States

LinkOut - more resources

Full Text Sources
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Performance reserves in brain-imaging-based phenotype prediction

Affiliations

Performance reserves in brain-imaging-based phenotype prediction

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Miscellaneous