Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Apr 22:3:imag_a_00547.
doi: 10.1162/imag_a_00547. eCollection 2025.

Reliability of structural brain change in cognitively healthy adult samples

Affiliations

Reliability of structural brain change in cognitively healthy adult samples

Didac Vidal-Piñeiro et al. Imaging Neurosci (Camb). .

Abstract

In neuroimaging research, tracking individuals over time is key to understanding the interplay between brain changes and genetic, environmental, or cognitive factors across the lifespan. Yet, the extent to which we can estimate the individual trajectories of brain change over time with precision remains uncertain. In this study, we estimated the reliability of structural brain change in cognitively healthy adults from multiple samples and assessed the influence of follow-up time and number of observations. Estimates of cross-sectional measurement error and brain change variance were obtained using the longitudinal FreeSurfer processing stream. Our findings showed, on average, modest longitudinal reliability with 2 years of follow-up. Increasing the follow-up time was associated with a substantial increase in longitudinal reliability, while the impact of increasing the number of observations was comparatively minor. On average, 2-year follow-up studies require ≈2.7 and ≈4.0 times more individuals than designs with follow-ups of 4 and 6 years to achieve comparable statistical power. Subcortical volume exhibited higher longitudinal reliability than cortical area, thickness, and volume. The reliability estimates were comparable with those estimated from empirical data. The reliability estimates were affected by both the cohort's age where younger adults had lower reliability of change and the preprocessing pipeline where the FreeSurfer's longitudinal stream was notably superior than the cross-sectional stream. Suboptimal reliability inflated sample size requirements and compromised the ability to distinguish individual trajectories of brain aging. This study underscores the importance of long-term follow-ups and the need to consider reliability in longitudinal neuroimaging research.

Keywords: aging; longitudinal; observations; reliability; structural MRI; study duration; validity.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1.
Fig. 1.
Schematic representation of time and follow-up effects on reliability. Hypothetical scenario illustrating a participant scanned twice at each time point, represented by the green and red lines. Measurement error causes deviations in the estimated slopes from the true trajectory (black line), which represents true change over time. In the main plots, points represent observed cross-sectional measurements, lines estimated longitudinal (linear) trajectories, and density plots represent the distribution of possible values for a given cross-sectional observation. The boxes show the observed yearly brain change. (a) Effects of follow-up time: Extending the follow-up time from 2 to 4 years reduces the impact of cross-sectional measurement error on yearly change estimates. (b) Effects of increasing the number of observations which leads to reductions of measurement error on yearly change estimates.
Fig. 2.
Fig. 2.
Longitudinal reliability of structural brain features. (a) Mean reliability (ICC) of structural brain change across features as a function of total follow-up time and number of equispaced observations. Error bars represent ± 1 SD. (b) Longitudinal reliability (ICC) for individual structural features, grouped by modality, shown for follow-up time of 4 years and three observations. Subcortical features are numbered as follows: 1. Lateral Ventricle, 2. Caudate, 3. Thalamus, 4. Pallidum, 5. Putamen, 6. Amygdala, 7. Hippocampus. Obs. = Number of observations. ICC = Intraclass Correlation Coefficient.
Fig. 3.
Fig. 3.
Power analysis for detecting correlations with longitudinal brain change. (a) Mean required sample size across structural features (with power = 80%,p< .05) for detecting correlations with longitudinal brain change of small, medium, and large effect sizes (based on conventional guidelines) across different follow-up times and number of observations. Gray horizontal lines are the estimated effect sizes given reliability ICC = 1. Estimated sample size required to detect significant correlations (p < .05, 80% power) between (b) the left hippocampus volume, (c) left entorhinal thickness and phenotypes with real correlations ranging from r=0.05 to 0.4, across different follow-up times and number of observations. For visualization purposes in (b) and (c), follow-up time is capped at 8 years and the number of observations shown is 3 and 7. r = Pearson’s Correlation. Obs.=Number of observations. Cth = Cortical thickness.
Fig. 4.
Fig. 4.
Overlapping between observed estimates of change. (a) Mean Bhattacharyya coefficient (BC) across features, quantifying the degree of overlap between two samples as a function of follow-up time and number of observations. The distributions represent possible observed estimates of brain change for three individuals: a normal ager, a maintainer, and a decliner who show decline at an average rate, 1 SD slower, and 1 SD faster, respectively. Overlap in observed brain change distributions for the (b) left hippocampus volume, and (c) left entorhinal thickness. For (b) and (c), distributions are shown for three and seven observations and 2, 6, and 10 years of study duration.
Fig. 5.
Fig. 5.
Misclassification based on external criteria. Misclassification of individuals based on an external criterion, that is, whether they exhibit no brain decline over the duration of the study. The density plots show the distribution of real trajectories of those subjects for whom we would observe no brain decline over time. Green and red fillings represent the proportion of real brain maintenance and real brain decliners, respectively. The text represents the proportion of participants showing no observed brain decline. Shown for the (a) left hippocampus volume and (b) left entorhinal thickness. Distributions displayed at three and seven observations and 2, 6, and 10 years of study duration. P(Mtrue|Mobs) = Probability of being a true brain maintainer given observed brain maintenance. P(Dtrue|Mobs) = Probability of being a true brain decliner given observed brain maintenance.
Fig. 6.
Fig. 6.
Longitudinal reliability and sample characteristics. Effect of cohort’s age on longitudinal reliability. Older individuals exhibit higher reliability than younger individuals, due to greater variability in slope estimates. Longitudinal reliability as a function of follow-up time, age, and number of observations for the (a) left hippocampus volume and (b) left entorhinal thickness. Only distributions at three and seven observations are shown. ICC = Intraclass Correlation Coefficient.
Fig. 7.
Fig. 7.
Longitudinal reliability using FreeSurfer cross-sectional stream. Impact of preprocessing stream on longitudinal reliability. (a) Mean reliability (ICC) of structural brain change across features as a function of total follow-up time and number of equispaced observations, estimated using the FreeSurfer cross-sectional stream. Mean differences in longitudinal reliability, by (b) follow-up time and number of observations and (c) modality, between data processed with the longitudinal versus cross-sectional FreeSurfer stream. Positive ΔICC indicates improved reliability estimates when using the longitudinal FreeSurfer Stream. Error bars represent ± 1 SD. FS = FreeSurfer. Obs. = Observations. ICC = Intraclass Correlation Coefficient.
Fig. 8.
Fig. 8.
Longitudinal reliability estimated empirically. Error variance of the slopes was estimated from a multi-cohort dataset rather than being analytically derived from the GRR index. (a) Mean reliability (ICC) of structural brain change across features as a function of total follow-up time and number of equispaced observations. Mean differences in longitudinal reliability by (b) follow-up time and number of observations and (c) modality, between the analytically derived and the empirical estimations of reliability. Positive ΔICC indicates higher estimates for the analytical derivation of reliability. Error bars represent ± 1 SD. Obs. = Observations. ICC = Intraclass Correlation Coefficient.

Similar articles

Cited by

References

    1. Alfaro-Almagro , F. , McCarthy , P. , Afyouni , S. , Andersson , J. L. R. , Bastiani , M. , Miller , K. L. , Nichols , T. E. , & Smith , S. M. ( 2021. ). Confound modelling in UK Biobank brain imaging . NeuroImage , 224 , 117002 . 10.1016/j.neuroimage.2020.117002 - DOI - PMC - PubMed
    1. Allen , M. J. , & Yen , W. M. ( 2001. ). Introduction to measurement theory . Waveland Press; . https://www.waveland.com/browse.php?t=367
    1. Appelbaum , M. , Cooper , H. , Kline , R. B. , Mayo-Wilson , E. , Nezu , A. M. , & Rao , S. M. ( 2018. ). Journal article reporting standards for quantitative research in psychology: The APA Publications and Communications Board task force report . American Psychologist , 73 , 3 – 25 . 10.1037/amp0000191 - DOI - PubMed
    1. Ard , M. C. , & Edland , S. D. ( 2011. ). Power calculations for clinical trials in Alzheimer’s disease . Journal of Alzheimer's Disease , 26 , 369 – 377 . 10.3233/JAD-2011-0062 - DOI - PMC - PubMed
    1. Bates , D. , Mächler , M. , Bolker , B. , & Walker , S. ( 2015. ). Fitting linear mixed-effects models using lme4 . Journal of Statistical Software , 67 , 1 – 48 . 10.18637/jss.v067.i01 - DOI