. 2021 Aug 1:236:118044.

doi: 10.1016/j.neuroimage.2021.118044. Epub 2021 Apr 10.

Resample aggregating improves the generalizability of connectome predictive modeling

David O'Connor¹, Evelyn M R Lake², Dustin Scheinost³, R Todd Constable⁴

Affiliations

¹ Department of Biomedical Engineering, Yale University, United States. Electronic address: dave.oconnor@yale.edu.
² Department of Radiology and Biomedical Imaging, Yale School of Medicine, United States.
³ Department of Biomedical Engineering, Yale University, United States; Department of Radiology and Biomedical Imaging, Yale School of Medicine, United States; Deparment of Statistics & Data Science, Yale University, United States; Child Study Center, Yale School of Medicine, United States.
⁴ Department of Biomedical Engineering, Yale University, United States; Department of Radiology and Biomedical Imaging, Yale School of Medicine, United States; Department of Neurosurgery, Yale School of Medicine, United States.

PMID: 33848621
PMCID: PMC8282199
DOI: 10.1016/j.neuroimage.2021.118044

Resample aggregating improves the generalizability of connectome predictive modeling

David O'Connor et al. Neuroimage. 2021.

. 2021 Aug 1:236:118044.

doi: 10.1016/j.neuroimage.2021.118044. Epub 2021 Apr 10.

Authors

David O'Connor¹, Evelyn M R Lake², Dustin Scheinost³, R Todd Constable⁴

Affiliations

¹ Department of Biomedical Engineering, Yale University, United States. Electronic address: dave.oconnor@yale.edu.
² Department of Radiology and Biomedical Imaging, Yale School of Medicine, United States.
³ Department of Biomedical Engineering, Yale University, United States; Department of Radiology and Biomedical Imaging, Yale School of Medicine, United States; Deparment of Statistics & Data Science, Yale University, United States; Child Study Center, Yale School of Medicine, United States.
⁴ Department of Biomedical Engineering, Yale University, United States; Department of Radiology and Biomedical Imaging, Yale School of Medicine, United States; Department of Neurosurgery, Yale School of Medicine, United States.

PMID: 33848621
PMCID: PMC8282199
DOI: 10.1016/j.neuroimage.2021.118044

Abstract

It is a longstanding goal of neuroimaging to produce reliable, generalizable models of brain behavior relationships. More recently, data driven predictive models have become popular. However, overfitting is a common problem with statistical models, which impedes model generalization. Cross validation (CV) is often used to estimate expected model performance within sample. Yet, the best way to generate brain behavior models, and apply them out-of-sample, on an unseen dataset, is unclear. As a solution, this study proposes an ensemble learning method, in this case resample aggregating, encompassing both model parameter estimation and feature selection. Here we investigate the use of resampled aggregated models when used to estimate fluid intelligence (fIQ) from fMRI based functional connectivity (FC) data. We take advantage of two large openly available datasets, the Human Connectome Project (HCP), and the Philadelphia Neurodevelopmental Cohort (PNC). We generate aggregated and non-aggregated models of fIQ in the HCP, using the Connectome Prediction Modelling (CPM) framework. Over various test-train splits, these models are evaluated in sample, on left-out HCP data, and out-of-sample, on PNC data. We find that a resample aggregated model performs best both within- and out-of-sample. We also find that feature selection can vary substantially within-sample. More robust feature selection methods, as detailed here, are needed to improve cross sample performance of CPM based brain behavior models.

PubMed Disclaimer

Figures

**Fig. 1.**
Analytic workflow. The PNC data is only used for out-of-sample testing. The HCP data is split into train and test samples. The train sample (400 subjects) is used to train 3 types of models: (1) resample aggregated models, (2) CV models, and (3) train-only models. All models are then tested within-sample on the test HCP sample, and out-of-sample on the PNC dataset.

**Fig. 2.**
Within and out-of-sample model performance, stratified by data split. In the left panel (purple), the first three columns show performance of resample aggregated models within-sample, columns 4–7 show the CV models, and the eight column shows performance of the train-only models. Each column has 20 boxplots, color-coded (in rainbow) by train/test split. All models are tested within-sample on 100 random subsamples of 200 subjects from the HCP test sample. The second panel (green) shows the performance of the same models (same order as the left panel) out-of-sample using random subsamples of 200 subjects from the PNC data set.

**Fig. 3.**
Within and out-of-sample model performance. Column one (shaded in purple) shows performance of all models, across all data splits, within-sample. All models are tested on 100 random subsamples of 200 subjects from the test sample of the HCP data set. The second column (shaded in green) shows the performance of the same models tested on random subsamples of 200 subjects from the PNC data set. The box and whisker plots show the median, interquartile range, and 5%–95% markers of the performance distribution. The underlying shaded violin plots show the shape of the model performance distribution.

**Fig. 4.**
Distribution of feature (edge) occurrence across subsamples for the ensemble models. For the bagged model (left), nearly 11,851 features occur in between 0% and 10% of bootstraps, compared to 6056 for the subsample 200 model (right) and 2839 for the subsample 300 model (center).

**Fig. 5.**
Relationship between effect size and feature occurrence for each aggregated model, across all resamples. The bagged models are shown in blue, the subsample 200 in orange, and the subsample 300 in green. The subplots on the right and top show probability density plots of the feature occurrence and effect size respectively.

**Fig. 6.**
Resample aggregated model performance within sample (white boxplots), and out of sample (gray boxplots) across feature frequency thresholds. This reflects the performance as tested on subsamples of 200 participants. The box and whisker plots show the median, interquartile range, and 5% – 95% markers of the performance distribution. The underlying shaded distribution shows the individual data points. The top panel shows the performance of the bagged models, as the feature threshold is increased (reducing the number of features included). Middle shows the performance of the subsample 300 models, as the feature threshold is increased. Bottom shows the performance of the subsample 200 models, as the feature threshold is increased. In the background of all plots, a density-based histogram of the percent of features (as a function of all features selected for a given model) included is shown in blue.

See this image and copyright information in PMC

Cited by

Representing Brain-Behavior Associations by Retaining High-Motion Minoritized Youth.
Ramduny J, Uddin LQ, Vanderwal T, Feczko E, Fair DA, Kelly C, Baskin-Sommers A. Ramduny J, et al. Biol Psychiatry Cogn Neurosci Neuroimaging. 2025 Feb 5:S2451-9022(25)00037-0. doi: 10.1016/j.bpsc.2025.01.014. Online ahead of print. Biol Psychiatry Cogn Neurosci Neuroimaging. 2025. PMID: 39921132 Free PMC article.
Cross-cohort replicability and generalizability of connectivity-based psychometric prediction patterns.
Wu J, Li J, Eickhoff SB, Hoffstaedter F, Hanke M, Yeo BTT, Genon S. Wu J, et al. Neuroimage. 2022 Nov 15;262:119569. doi: 10.1016/j.neuroimage.2022.119569. Epub 2022 Aug 17. Neuroimage. 2022. PMID: 35985618 Free PMC article.
Current best practices and future opportunities for reproducible findings using large-scale neuroimaging in psychiatry.
Jahanshad N, Lenzini P, Bijsterbosch J. Jahanshad N, et al. Neuropsychopharmacology. 2024 Nov;50(1):37-51. doi: 10.1038/s41386-024-01938-8. Epub 2024 Aug 8. Neuropsychopharmacology. 2024. PMID: 39117903 Free PMC article. Review.
Identifying dynamic reproducible brain states using a predictive modelling approach.
O'Connor D, Horien C, Mandino F, Constable RT. O'Connor D, et al. Imaging Neurosci (Camb). 2025 Apr 17;3:imag_a_00540. doi: 10.1162/imag_a_00540. eCollection 2025. Imaging Neurosci (Camb). 2025. PMID: 40800949 Free PMC article.
Increasing the representation of minoritized youth for inclusive and reproducible brain-behavior associations.
Ramduny J, Uddin LQ, Vanderwal T, Feczko E, Fair DA, Kelly C, Baskin-Sommers A. Ramduny J, et al. bioRxiv [Preprint]. 2024 Jun 28:2024.06.22.600221. doi: 10.1101/2024.06.22.600221. bioRxiv. 2024. PMID: 38979302 Free PMC article. Preprint.

See all "Cited by" articles

References

1. Woo C-W, Chang LJ, Lindquist MA, Wager TD, 2017. Building better biomarkers: brain models in translational neuroimaging. Nat. Neurosci 20 (3), 365–377 March. - PMC - PubMed
1. Bzdok D, Ioannidis JPA, 2019. Exploration, Inference, and Prediction in Neuroscience and Biomedicine. Trends Neurosci. 42 (4), 251–262 Elsevier Ltd 01-April-. - PubMed
1. Insel T, et al., 2010. Research domain criteria (RDoC): toward a. Am. J. Psychiatry Online 748–751 no. July. - PubMed
1. Badhwar AP, et al., 2020. Multivariate consistency of resting-state fMRI connectivity maps acquired on a single individual over 2.5 years, 13 sites and 3 vendors. Neuroimage 205, 116210 January. - PubMed
1. Keilholz SD, Pan W-J, Billings J, Nezafati M, Shakil S, 2017. Noise and non-neuronal contributions to the BOLD signal: applications to and insights from animal studies. Neuroimage 154, 267–281 July. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Research Materials
- NCI CPTC Antibody Characterization Program
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Resample aggregating improves the generalizability of connectome predictive modeling

Affiliations

Resample aggregating improves the generalizability of connectome predictive modeling

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials

Miscellaneous