What Is the Test-Retest Reliability of Common Task-Functional MRI Measures? New Empirical Evidence and a Meta-Analysis

doi:10.1177/0956797620916786

Meta-Analysis

. 2020 Jul;31(7):792-806.

doi: 10.1177/0956797620916786. Epub 2020 Jun 3.

What Is the Test-Retest Reliability of Common Task-Functional MRI Measures? New Empirical Evidence and a Meta-Analysis

Maxwell L Elliott¹, Annchen R Knodt¹, David Ireland², Meriwether L Morris¹, Richie Poulton², Sandhya Ramrakha², Maria L Sison¹, Terrie E Moffitt^{1

3

4

5}, Avshalom Caspi^{1

3

4

5}, Ahmad R Hariri¹

Affiliations

¹ Department of Psychology & Neuroscience, Duke University.
² Dunedin Multidisciplinary Health and Development Research Unit, Department of Psychology, University of Otago.
³ Social, Genetic, & Developmental Psychiatry Research Centre, Institute of Psychiatry, Psychology, & Neuroscience, King's College London.
⁴ Department of Psychiatry & Behavioral Sciences, Duke University School of Medicine.
⁵ Center for Genomic and Computational Biology, Duke University.

PMID: 32489141
PMCID: PMC7370246
DOI: 10.1177/0956797620916786

Meta-Analysis

What Is the Test-Retest Reliability of Common Task-Functional MRI Measures? New Empirical Evidence and a Meta-Analysis

Maxwell L Elliott et al. Psychol Sci. 2020 Jul.

. 2020 Jul;31(7):792-806.

doi: 10.1177/0956797620916786. Epub 2020 Jun 3.

Authors

Affiliations

¹ Department of Psychology & Neuroscience, Duke University.
² Dunedin Multidisciplinary Health and Development Research Unit, Department of Psychology, University of Otago.
³ Social, Genetic, & Developmental Psychiatry Research Centre, Institute of Psychiatry, Psychology, & Neuroscience, King's College London.
⁴ Department of Psychiatry & Behavioral Sciences, Duke University School of Medicine.
⁵ Center for Genomic and Computational Biology, Duke University.

PMID: 32489141
PMCID: PMC7370246
DOI: 10.1177/0956797620916786

Abstract

Identifying brain biomarkers of disease risk is a growing priority in neuroscience. The ability to identify meaningful biomarkers is limited by measurement reliability; unreliable measures are unsuitable for predicting clinical outcomes. Measuring brain activity using task functional MRI (fMRI) is a major focus of biomarker development; however, the reliability of task fMRI has not been systematically evaluated. We present converging evidence demonstrating poor reliability of task-fMRI measures. First, a meta-analysis of 90 experiments (N = 1,008) revealed poor overall reliability-mean intraclass correlation coefficient (ICC) = .397. Second, the test-retest reliabilities of activity in a priori regions of interest across 11 common fMRI tasks collected by the Human Connectome Project (N = 45) and the Dunedin Study (N = 20) were poor (ICCs = .067-.485). Collectively, these findings demonstrate that common task-fMRI measures are not currently suitable for brain biomarker discovery or for individual-differences research. We review how this state of affairs came to be and highlight avenues for improving task-fMRI reliability.

Keywords: cognitive neuroscience; individual differences; neuroimaging; statistical analysis.

PubMed Disclaimer

Conflict of interest statement

Declaration of Conflicting Interests: The author(s) declared that there were no conflicts of interest with respect to the authorship or the publication of this article.

Figures

**Fig. 1.**
The influence of task-functional MRI (fMRI) test-retest reliability on the sample size required for 80% power to detect brain–behavior correlations of effect sizes commonly found in psychological research. Power curves are shown for three levels of reliability of the associated behavioral or clinical phenotype. The figure was generated using the *pwr.r.test* function in R (Champely, 2018), with the value for r specified according to the attenuation formula in the Appendix. ICC = intraclass correlation coefficient.

**Fig. 2.**
Flow diagram for the systematic literature review and meta-analysis.

**Fig. 3.**
Meta-analysis forest plot displaying the estimate of test-retest reliability for each task-functional MRI (fMRI) measure from all intraclass correlation coefficients (ICCs) reported in each study. The first column labels each article by the first author’s last name and year of publication. References for all articles listed here are provided in the Supplemental Material available online. In the subject-type column, “h” indicates that the sample in the study consisted of healthy controls, and “c” indicates a clinical sample. Studies are split into two subgroups. In the first group of studies, authors reported all ICCs that were calculated, thereby allowing for a relatively unbiased estimate of reliability. In the second group of studies, authors selected a subset of calculated ICCs (on the basis of the magnitude of the ICC or another nonindependent statistic) and then reported ICCs only from that subset. This practice led to inflated reliability estimates, and therefore these studies were meta-analyzed separately to highlight this bias. Error bars indicate 95% confidence intervals (CIs). MID = monetary incentive delay; LH = left hand, RH = right hand.

**Fig. 4.**
Whole-brain activation and reliability maps for three task-functional MRI measures used in both the Human Connectome Project and the Dunedin Study. For each task, a whole-brain activation map of the primary within-subjects contrast (t score) is displayed in warm colors (top), and a whole-brain map of the between-subjects reliability (intraclass correlation coefficient, or ICC) is shown in cool colors (bottom). For each task, the target region of interest is outlined in sky blue. The activation maps are thresholded at p < .05 and are whole-brain corrected for multiple comparisons using threshold-free cluster enhancement. The ICC maps are thresholded so that voxels with ICCs of less than .4 are not colored. Values for X, Y, and Z are given in Montreal Neurological Institute coordinates.

**Fig. 5.**
Test-retest reliabilities of region-wise activation measures in 11 commonly used task-functional MRI paradigms and three common structural MRI measures, separately for the Human Connectome Project (left) and the Dunedin Study (right). For each task, intraclass correlation coefficients (ICCs) were estimated for activation in the a priori target region of interest (ROI; circled in black) and in nontarget ROIs selected from the other tasks. Nontarget ROIs were the anterior temporal lobe (ATL), dorsolateral prefrontal cortex (dlPFC), precentral gyrus (PCG), rostrolateral prefrontal cortex (rlPFC), and ventral striatum (VS). As a benchmark, ICCs of three common structural MRI measures—cortical thickness (CT), surface area (SA), and subcortical volume—are depicted as violin plots representing the distribution of ICCs for each of the 360 parcels for CT and SA and the 17 subcortical structures for gray-matter volume. Negative ICCs are set to 0 for purposes of visualization. EF = executive function.

See this image and copyright information in PMC

Comment in

Need for Psychometric Theory in Neuroscience Research and Training: Reply to Kragel et al. (2021).
Elliott ML, Knodt AR, Caspi A, Moffitt TE, Hariri AR. Elliott ML, et al. Psychol Sci. 2021 Apr;32(4):627-629. doi: 10.1177/0956797621996665. Epub 2021 Mar 8. Psychol Sci. 2021. PMID: 33685291 No abstract available.
Functional MRI Can Be Highly Reliable, but It Depends on What You Measure: A Commentary on Elliott et al. (2020).
Kragel PA, Han X, Kraynak TE, Gianaros PJ, Wager TD. Kragel PA, et al. Psychol Sci. 2021 Apr;32(4):622-626. doi: 10.1177/0956797621989730. Epub 2021 Mar 8. Psychol Sci. 2021. PMID: 33685310 Free PMC article. No abstract available.

Cited by

Noxious pressure stimulation demonstrates robust, reliable estimates of brain activity and self-reported pain.
Jackson JB, O'Daly O, Makovac E, Medina S, Rubio AL, McMahon SB, Williams SCR, Howard MA. Jackson JB, et al. Neuroimage. 2020 Nov 1;221:117178. doi: 10.1016/j.neuroimage.2020.117178. Epub 2020 Jul 22. Neuroimage. 2020. PMID: 32707236 Free PMC article.
Elevating the field for applying neuroimaging to individual patients in psychiatry.
Roalf DR, Figee M, Oathes DJ. Roalf DR, et al. Transl Psychiatry. 2024 Feb 10;14(1):87. doi: 10.1038/s41398-024-02781-7. Transl Psychiatry. 2024. PMID: 38341414 Free PMC article. Review.
Characterization of whole-brain task-modulated functional connectivity in response to nociceptive pain: A multisensory comparison study.
Li L, Di X, Zhang H, Huang G, Zhang L, Liang Z, Zhang Z. Li L, et al. Hum Brain Mapp. 2022 Feb 15;43(3):1061-1075. doi: 10.1002/hbm.25707. Epub 2021 Nov 11. Hum Brain Mapp. 2022. PMID: 34761468 Free PMC article.
Internal reliability of blame-related functional MRI measures in major depressive disorder.
Fennema D, O'Daly O, Barker GJ, Moll J, Zahn R. Fennema D, et al. Neuroimage Clin. 2021;32:102901. doi: 10.1016/j.nicl.2021.102901. Epub 2021 Nov 28. Neuroimage Clin. 2021. PMID: 34911203 Free PMC article.
The impact of neighborhood disadvantage on amygdala reactivity: Pathways through neighborhood social processes.
Suarez GL, Burt SA, Gard AM, Burton J, Clark DA, Klump KL, Hyde LW. Suarez GL, et al. Dev Cogn Neurosci. 2022 Apr;54:101061. doi: 10.1016/j.dcn.2022.101061. Epub 2022 Jan 12. Dev Cogn Neurosci. 2022. PMID: 35042163 Free PMC article.

See all "Cited by" articles

References

1. Barch D. M., Burgess G. C., Harms M. P., Petersen S. E., Schlaggar B. L., Corbetta M. . . . WU-Minn HCP Consortium. (2013). Function in the human connectome: Task-fMRI and individual differences in behavior. NeuroImage, 80, 169–189. - PMC - PubMed
1. Bennett C. M., Miller M. B. (2010). How reliable are the results from functional magnetic resonance imaging? Annals of the New York Academy of Sciences, 1191, 133–155. - PubMed
1. Borenstein M., Hedges L. V., Higgins J. P. T., Rothstein H. R. (2009). Introduction to meta-analysis. Chichester, England: John Wiley. doi:10.1002/9780470743386 - DOI
1. Button K. S., Ioannidis J. P. A., Mokrysz C., Nosek B. A., Flint J., Robinson E. S. J., Munafò M. R. (2013). Power failure: Why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience, 14, 365–376. - PubMed
1. Champely S. (2018). Package ‘pwr.’ Retrieved from http://cran.r-project.org/package=pwr

Publication types

Actions
Actions
Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information

[1] Barch D. M., Burgess G. C., Harms M. P., Petersen S. E., Schlaggar B. L., Corbetta M. . . . WU-Minn HCP Consortium. (2013). Function in the human connectome: Task-fMRI and individual differences in behavior. NeuroImage, 80, 169–189. - PMC - PubMed

[2] Barch D. M., Burgess G. C., Harms M. P., Petersen S. E., Schlaggar B. L., Corbetta M. . . . WU-Minn HCP Consortium. (2013). Function in the human connectome: Task-fMRI and individual differences in behavior. NeuroImage, 80, 169–189. - PMC - PubMed

[3] Bennett C. M., Miller M. B. (2010). How reliable are the results from functional magnetic resonance imaging? Annals of the New York Academy of Sciences, 1191, 133–155. - PubMed

[4] Bennett C. M., Miller M. B. (2010). How reliable are the results from functional magnetic resonance imaging? Annals of the New York Academy of Sciences, 1191, 133–155. - PubMed

[5] Borenstein M., Hedges L. V., Higgins J. P. T., Rothstein H. R. (2009). Introduction to meta-analysis. Chichester, England: John Wiley. doi:10.1002/9780470743386 - DOI

[6] Borenstein M., Hedges L. V., Higgins J. P. T., Rothstein H. R. (2009). Introduction to meta-analysis. Chichester, England: John Wiley. doi:10.1002/9780470743386 - DOI

[7] Button K. S., Ioannidis J. P. A., Mokrysz C., Nosek B. A., Flint J., Robinson E. S. J., Munafò M. R. (2013). Power failure: Why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience, 14, 365–376. - PubMed

[8] Button K. S., Ioannidis J. P. A., Mokrysz C., Nosek B. A., Flint J., Robinson E. S. J., Munafò M. R. (2013). Power failure: Why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience, 14, 365–376. - PubMed

[9] Champely S. (2018). Package ‘pwr.’ Retrieved from http://cran.r-project.org/package=pwr

[10] Champely S. (2018). Package ‘pwr.’ Retrieved from http://cran.r-project.org/package=pwr

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

What Is the Test-Retest Reliability of Common Task-Functional MRI Measures? New Empirical Evidence and a Meta-Analysis

Affiliations

What Is the Test-Retest Reliability of Common Task-Functional MRI Measures? New Empirical Evidence and a Meta-Analysis

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Comment in

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Medical