Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Nov 1;27(11):5415-5429.
doi: 10.1093/cercor/bhx230.

Influences on the Test-Retest Reliability of Functional Connectivity MRI and its Relationship with Behavioral Utility

Affiliations

Influences on the Test-Retest Reliability of Functional Connectivity MRI and its Relationship with Behavioral Utility

Stephanie Noble et al. Cereb Cortex. .

Abstract

Best practices are currently being developed for the acquisition and processing of resting-state magnetic resonance imaging data used to estimate brain functional organization-or "functional connectivity." Standards have been proposed based on test-retest reliability, but open questions remain. These include how amount of data per subject influences whole-brain reliability, the influence of increasing runs versus sessions, the spatial distribution of reliability, the reliability of multivariate methods, and, crucially, how reliability maps onto prediction of behavior. We collected a dataset of 12 extensively sampled individuals (144 min data each across 2 identically configured scanners) to assess test-retest reliability of whole-brain connectivity within the generalizability theory framework. We used Human Connectome Project data to replicate these analyses and relate reliability to behavioral prediction. Overall, the historical 5-min scan produced poor reliability averaged across connections. Increasing the number of sessions was more beneficial than increasing runs. Reliability was lowest for subcortical connections and highest for within-network cortical connections. Multivariate reliability was greater than univariate. Finally, reliability could not be used to improve prediction; these findings are among the first to underscore this distinction for functional connectivity. A comprehensive understanding of test-retest reliability, including its limitations, supports the development of best practices in the field.

Keywords: behavioral prediction; multivariate; resting state functional connectivity; test–retest reliability; whole brain.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Study design. A total of 12 subjects were each scanned at 4 sessions (2 scanners × 2 days), with each session comprising six 6-min runs for a total of 36 min of data per session. All scans were acquired with subjects at rest with eyes open. Connectivity matrices were obtained independently for each run.
Figure 2.
Figure 2.
Effect of number of sessions and scan duration on mean test–retest reliability over all edges. A Decision Study was performed to estimate absolute reliability (Φ) as a function of scan duration (min) and number of sessions. Brighter colors correspond to higher levels of test–retest reliability, and results are categorized as follows: poor < 0.4, fair = 0.4–0.59, good = 0.6–0.74, excellent≥0.74 (Cicchetti and Sparrow 1981). The asymmetry across the diagonal indicates that test–retest reliability improves more quickly with increasing number of sessions than with increasing scan durations. Accordingly, for a single session, even with 36 min of data, only poor test–retest reliability is obtained. However, fair test–retest reliability can be obtained with only 24 min of data collected over 4 sessions of 6 min each. Good test–retest reliability requires 96–108 min of data divided over 3–4 sessions. Excellent test–retest reliability cannot be achieved using the maximum amount of data collected (4 sessions × 36 min, or 2.4 h in total).
Figure 3.
Figure 3.
Spatial distribution of test–retest reliability, organized by node. (a) Mean test–retest reliability (Φ) of connectivity at each node. For each node, the mean test–retest reliability of all edges associated with that node was calculated for a single session with a 36-min scan duration. Brighter colors correspond to higher levels of test–retest reliability, and results are categorized as follows: poor < 0.4, fair = 0.4–0.59, good = 0.6–0.74, excellent≥0.74 (Cicchetti and Sparrow 1981). Cortical nodes exhibited greater test–retest reliability than noncortical nodes. (b) Minimum scan duration needed to achieve mean fair test–retest reliability at each node for a single session. Brighter colors correspond to shorter scan durations, and are scaled differently than for (a). Cortical nodes became reliable at shorter scan durations than noncortical nodes.
Figure 4.
Figure 4.
(a) Spatial distribution of test–retest reliability, summarized by network. For each pair of networks, the mean test–retest reliability (Φ) of all edges between those networks was calculated for a single session with variable scan duration (as noted in figures, scan duration=6, 12, 18, 24, 30, and 36 min). The frontoparietal, medial frontal, and secondary visual networks showed the highest test–retest reliability. Brighter colors correspond to higher levels of test–retest reliability, and results are categorized as follows: poor < 0.4, fair = 0.4–0.59, good = 0.6–0.74, excellent≥0.74 (Cicchetti and Sparrow 1981). (b) Minimum scan duration needed to achieve mean fair, good, and excellent test–retest reliability, summarized by network. For each pair of networks, the mean test–retest reliability of all edges between those networks was calculated (Φ) for a single session with variable scan duration (scan duration=6, 12, 18, 24, 30, and 36 min), as above. The minimum scan duration resulting in mean fair, good, and excellent test–retest reliability for that network pair was then determined. Only within-network frontoparietal connectivity reached good test–retest reliability in a single session. No network pair reached excellent test–retest reliability. Subcortical regions never reached fair test–retest reliability. Brighter colors correspond with shorter scan durations. MF, medial frontal; FP, frontoparietal; DMN, default mode; Mot, Motor; VI, visual I; VII, visual II; VAs, visual association; Lim, limbic; BG, basal ganglia (including thalamus and striatum); CBL, cerebellum.
Figure 5.
Figure 5.
Effect of number of sessions and scan duration on multivariate test–retest reliability of the connectivity matrix. A Decision Study was performed to estimate multivariate absolute test–retest reliability (Φ) as a function of scan duration (min) and number of sessions. Brighter colors correspond to higher levels of test–retest reliability, and results are categorized as follows: poor < 0.4, fair = 0.4–0.59, good = 0.6–0.74, excellent≥0.74 (Cicchetti and Sparrow 1981). The asymmetry across the diagonal indicates that test–retest reliability improves more quickly with increasing number of sessions than with increasing scan durations. Accordingly, fair test–retest reliability can be obtained with less total data if data is acquired with multiple sessions than if only a single session is used. Unlike in the univariate case, mean fair test–retest reliability can be achieved in a single session and mean excellent test–retest reliability can be achieved using the maximum amount of data collected (4 sessions × 36 min).
Figure 6.
Figure 6.
Difference in test–retest reliability between edges predictive and not predictive of gF. Edges categorized as predictive of gF are those included in predictive networks for every fold of the cross-validation; the distribution of predictive edges is shown at left. Tukey boxplots show median (red line), data between first and third quartile (edges of box), and suspected outliers (whiskers and red crosses; beyond 1.5 inter-quartile range [IQR]).
Figure 7.
Figure 7.
Edge-wise relationship between test–retest reliability and behavioral relevance. Behavioral relevance refers to the correlation between that edge's strength and fluid intelligence (gF) across all subjects (abs(r)). For the plot of the spatial distribution of behavioral relevance (top left), warmer colors are more positively correlated with behavior, and cooler colors are more negatively correlated with behavior. For the plot of test–retest reliability (bottom left), brighter colors are more reliable. For the scatterplot (right), each point represents a single edge. Here, behavioral relevance is the absolute value of behavioral relevance to facilitate finding effects related to magnitude of relevance.
Figure 8.
Figure 8.
Influence of reliable and unreliable edges on prediction of fluid intelligence. An increasing number of unreliable edges are removed toward the right of the x-axis, and an increasing number of reliable edges are removed toward the left. Removal of unreliable edges starts with all edges showing Φ = 0, then removing in intervals of 3200; removal of reliable edges also occurs in increments of 3200 (+1 for the first interval). The removal of 2/3 of all edges (23 852 edges removed) are marked with vertical lines; changes in performance greater than 0.025 from performance with all edges are marked with horizontal lines.

References

    1. Abbott AE, Nair A, Keown CL, Datko M, Jahedi A, Fishman I, Müller R-A. 2015. Patterns of atypical functional connectivity and behavioral links in autism differ between default, salience, and executive networks. Cereb Cortex. 26(10):4034–4045. - PMC - PubMed
    1. Alonso-Solís A, Vives-Gilabert Y, Grasa E, Portella MJ, Rabella M, Sauras RB, Roldán A, Núñez-Marín F, Gómez-Ansón B, Pérez V. 2015. Resting-state functional connectivity alterations in the default network of schizophrenia patients with persistent auditory verbal hallucinations. Schizophrenia Res. 161:261–268. - PubMed
    1. Anderson JS, Ferguson MA, Lopez-Larson M, Yurgelun-Todd D. 2011. Reproducibility of single-subject functional connectivity measurements. AJNR Am J Neuroradiol. 32:548–555. - PMC - PubMed
    1. Aurich NK, Alves Filho JO, Marques da Silva AM, Franco AR. 2015. Evaluating the reliability of different preprocessing steps to estimate graph theoretical measures in resting state fMRI data. Front Neurosci. 9:48. - PMC - PubMed
    1. Baker JT, Holmes AJ, Masters GA, Yeo BT, Krienen F, Buckner RL, Öngür D. 2014. Disruption of cortical association networks in schizophrenia and psychotic bipolar disorder. JAMA Psychiatry. 71:109–118. - PMC - PubMed

Publication types