Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Meta-Analysis
. 2020 Jul;31(7):792-806.
doi: 10.1177/0956797620916786. Epub 2020 Jun 3.

What Is the Test-Retest Reliability of Common Task-Functional MRI Measures? New Empirical Evidence and a Meta-Analysis

Affiliations
Meta-Analysis

What Is the Test-Retest Reliability of Common Task-Functional MRI Measures? New Empirical Evidence and a Meta-Analysis

Maxwell L Elliott et al. Psychol Sci. 2020 Jul.

Abstract

Identifying brain biomarkers of disease risk is a growing priority in neuroscience. The ability to identify meaningful biomarkers is limited by measurement reliability; unreliable measures are unsuitable for predicting clinical outcomes. Measuring brain activity using task functional MRI (fMRI) is a major focus of biomarker development; however, the reliability of task fMRI has not been systematically evaluated. We present converging evidence demonstrating poor reliability of task-fMRI measures. First, a meta-analysis of 90 experiments (N = 1,008) revealed poor overall reliability-mean intraclass correlation coefficient (ICC) = .397. Second, the test-retest reliabilities of activity in a priori regions of interest across 11 common fMRI tasks collected by the Human Connectome Project (N = 45) and the Dunedin Study (N = 20) were poor (ICCs = .067-.485). Collectively, these findings demonstrate that common task-fMRI measures are not currently suitable for brain biomarker discovery or for individual-differences research. We review how this state of affairs came to be and highlight avenues for improving task-fMRI reliability.

Keywords: cognitive neuroscience; individual differences; neuroimaging; statistical analysis.

PubMed Disclaimer

Conflict of interest statement

Declaration of Conflicting Interests: The author(s) declared that there were no conflicts of interest with respect to the authorship or the publication of this article.

Figures

Fig. 1.
Fig. 1.
The influence of task-functional MRI (fMRI) test-retest reliability on the sample size required for 80% power to detect brain–behavior correlations of effect sizes commonly found in psychological research. Power curves are shown for three levels of reliability of the associated behavioral or clinical phenotype. The figure was generated using the pwr.r.test function in R (Champely, 2018), with the value for r specified according to the attenuation formula in the Appendix. ICC = intraclass correlation coefficient.
Fig. 2.
Fig. 2.
Flow diagram for the systematic literature review and meta-analysis.
Fig. 3.
Fig. 3.
Meta-analysis forest plot displaying the estimate of test-retest reliability for each task-functional MRI (fMRI) measure from all intraclass correlation coefficients (ICCs) reported in each study. The first column labels each article by the first author’s last name and year of publication. References for all articles listed here are provided in the Supplemental Material available online. In the subject-type column, “h” indicates that the sample in the study consisted of healthy controls, and “c” indicates a clinical sample. Studies are split into two subgroups. In the first group of studies, authors reported all ICCs that were calculated, thereby allowing for a relatively unbiased estimate of reliability. In the second group of studies, authors selected a subset of calculated ICCs (on the basis of the magnitude of the ICC or another nonindependent statistic) and then reported ICCs only from that subset. This practice led to inflated reliability estimates, and therefore these studies were meta-analyzed separately to highlight this bias. Error bars indicate 95% confidence intervals (CIs). MID = monetary incentive delay; LH = left hand, RH = right hand.
Fig. 4.
Fig. 4.
Whole-brain activation and reliability maps for three task-functional MRI measures used in both the Human Connectome Project and the Dunedin Study. For each task, a whole-brain activation map of the primary within-subjects contrast (t score) is displayed in warm colors (top), and a whole-brain map of the between-subjects reliability (intraclass correlation coefficient, or ICC) is shown in cool colors (bottom). For each task, the target region of interest is outlined in sky blue. The activation maps are thresholded at p < .05 and are whole-brain corrected for multiple comparisons using threshold-free cluster enhancement. The ICC maps are thresholded so that voxels with ICCs of less than .4 are not colored. Values for X, Y, and Z are given in Montreal Neurological Institute coordinates.
Fig. 5.
Fig. 5.
Test-retest reliabilities of region-wise activation measures in 11 commonly used task-functional MRI paradigms and three common structural MRI measures, separately for the Human Connectome Project (left) and the Dunedin Study (right). For each task, intraclass correlation coefficients (ICCs) were estimated for activation in the a priori target region of interest (ROI; circled in black) and in nontarget ROIs selected from the other tasks. Nontarget ROIs were the anterior temporal lobe (ATL), dorsolateral prefrontal cortex (dlPFC), precentral gyrus (PCG), rostrolateral prefrontal cortex (rlPFC), and ventral striatum (VS). As a benchmark, ICCs of three common structural MRI measures—cortical thickness (CT), surface area (SA), and subcortical volume—are depicted as violin plots representing the distribution of ICCs for each of the 360 parcels for CT and SA and the 17 subcortical structures for gray-matter volume. Negative ICCs are set to 0 for purposes of visualization. EF = executive function.

Comment in

Similar articles

Cited by

References

    1. Barch D. M., Burgess G. C., Harms M. P., Petersen S. E., Schlaggar B. L., Corbetta M. . . . WU-Minn HCP Consortium. (2013). Function in the human connectome: Task-fMRI and individual differences in behavior. NeuroImage, 80, 169–189. - PMC - PubMed
    1. Bennett C. M., Miller M. B. (2010). How reliable are the results from functional magnetic resonance imaging? Annals of the New York Academy of Sciences, 1191, 133–155. - PubMed
    1. Borenstein M., Hedges L. V., Higgins J. P. T., Rothstein H. R. (2009). Introduction to meta-analysis. Chichester, England: John Wiley. doi:10.1002/9780470743386 - DOI
    1. Button K. S., Ioannidis J. P. A., Mokrysz C., Nosek B. A., Flint J., Robinson E. S. J., Munafò M. R. (2013). Power failure: Why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience, 14, 365–376. - PubMed
    1. Champely S. (2018). Package ‘pwr.’ Retrieved from http://cran.r-project.org/package=pwr

Publication types