Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Jun;582(7810):84-88.
doi: 10.1038/s41586-020-2314-9. Epub 2020 May 20.

Variability in the analysis of a single neuroimaging dataset by many teams

Rotem Botvinik-Nezer  1   2   3 Felix Holzmeister  4 Colin F Camerer  5 Anna Dreber  6   7 Juergen Huber  4 Magnus Johannesson  6 Michael Kirchler  4 Roni Iwanir  1   2 Jeanette A Mumford  8 R Alison Adcock  9   10 Paolo Avesani  11   12 Blazej M Baczkowski  13 Aahana Bajracharya  14 Leah Bakst  15   16 Sheryl Ball  17   18 Marco Barilari  19 Nadège Bault  20 Derek Beaton  21 Julia Beitner  22   23 Roland G Benoit  24 Ruud M W J Berkers  24 Jamil P Bhanji  25 Bharat B Biswal  26   27 Sebastian Bobadilla-Suarez  28 Tiago Bortolini  29 Katherine L Bottenhorn  30 Alexander Bowring  31 Senne Braem  32   33 Hayley R Brooks  34 Emily G Brudner  25 Cristian B Calderon  32 Julia A Camilleri  35   36 Jaime J Castrellon  9   37 Luca Cecchetti  38 Edna C Cieslik  35   36 Zachary J Cole  39 Olivier Collignon  12   19 Robert W Cox  40 William A Cunningham  41 Stefan Czoschke  42 Kamalaker Dadi  43 Charles P Davis  44   45   46 Alberto De Luca  47 Mauricio R Delgado  25 Lysia Demetriou  48   49 Jeffrey B Dennison  50 Xin Di  26   27 Erin W Dickie  51   52 Ekaterina Dobryakova  53 Claire L Donnat  54 Juergen Dukart  35   36 Niall W Duncan  55   56 Joke Durnez  57 Amr Eed  58 Simon B Eickhoff  35   36 Andrew Erhart  34 Laura Fontanesi  59 G Matthew Fricke  60 Shiguang Fu  61   62 Adriana Galván  63 Remi Gau  19 Sarah Genon  35   36 Tristan Glatard  64 Enrico Glerean  65 Jelle J Goeman  66 Sergej A E Golowin  55 Carlos González-García  32 Krzysztof J Gorgolewski  67 Cheryl L Grady  21 Mikella A Green  9   37 João F Guassi Moreira  63 Olivia Guest  28   68 Shabnam Hakimi  9 J Paul Hamilton  69 Roeland Hancock  45   46 Giacomo Handjaras  38 Bronson B Harry  70 Colin Hawco  71 Peer Herholz  72 Gabrielle Herman  71 Stephan Heunis  73   74 Felix Hoffstaedter  35   36 Jeremy Hogeveen  75   76 Susan Holmes  54 Chuan-Peng Hu  77 Scott A Huettel  37 Matthew E Hughes  78 Vittorio Iacovella  12 Alexandru D Iordan  79 Peder M Isager  80 Ayse I Isik  81 Andrew Jahn  82 Matthew R Johnson  39   83 Tom Johnstone  78 Michael J E Joseph  71 Anthony C Juliano  84 Joseph W Kable  85   86 Michalis Kassinopoulos  87 Cemal Koba  38 Xiang-Zhen Kong  88 Timothy R Koscik  89 Nuri Erkut Kucukboyaci  53   90 Brice A Kuhl  91 Sebastian Kupek  92 Angela R Laird  93 Claus Lamm  94   95 Robert Langner  35   36 Nina Lauharatanahirun  96   97 Hongmi Lee  98 Sangil Lee  85 Alexander Leemans  47 Andrea Leo  38 Elise Lesage  32 Flora Li  99   100 Monica Y C Li  44   45   46   101 Phui Cheng Lim  39   83 Evan N Lintz  39 Schuyler W Liphardt  102 Annabel B Losecaat Vermeer  94 Bradley C Love  28   103 Michael L Mack  41 Norberto Malpica  104 Theo Marins  29 Camille Maumet  105 Kelsey McDonald  37 Joseph T McGuire  15   16 Helena Melero  104   106   107 Adriana S Méndez Leal  63 Benjamin Meyer  77   108 Kristin N Meyer  109 Glad Mihai  110   111 Georgios D Mitsis  112 Jorge Moll  29   67 Dylan M Nielson  113 Gustav Nilsonne  114   115 Michael P Notter  116 Emanuele Olivetti  11   12 Adrian I Onicas  38 Paolo Papale  38   117 Kaustubh R Patil  35   36 Jonathan E Peelle  14 Alexandre Pérez  72 Doris Pischedda  118   119   120 Jean-Baptiste Poline  72   121 Yanina Prystauka  44   45   46 Shruti Ray  26 Patricia A Reuter-Lorenz  79 Richard C Reynolds  122 Emiliano Ricciardi  38 Jenny R Rieck  21 Anais M Rodriguez-Thompson  109 Anthony Romyn  41 Taylor Salo  30 Gregory R Samanez-Larkin  9   37 Emilio Sanz-Morales  104 Margaret L Schlichting  41 Douglas H Schultz  39   83 Qiang Shen  61   62 Margaret A Sheridan  109 Jennifer A Silvers  63 Kenny Skagerlund  123   124 Alec Smith  17   18 David V Smith  50 Peter Sokol-Hessner  34 Simon R Steinkamp  125 Sarah M Tashjian  63 Bertrand Thirion  43 John N Thorp  126 Gustav Tinghög  127   128 Loreen Tisdall  67   129 Steven H Tompson  96 Claudio Toro-Serey  15   16 Juan Jesus Torre Tresols  43 Leonardo Tozzi  130 Vuong Truong  55   56 Luca Turella  12 Anna E van 't Veer  131 Tom Verguts  32 Jean M Vettel  132   133   134 Sagana Vijayarajah  41 Khoi Vo  9   37 Matthew B Wall  135   136   137 Wouter D Weeda  131 Susanne Weis  35   36 David J White  138 David Wisniewski  32 Alba Xifra-Porxas  87 Emily A Yearling  44   45   46 Sangsuk Yoon  139 Rui Yuan  130 Kenneth S L Yuen  77   108 Lei Zhang  94 Xu Zhang  45   46   140 Joshua E Zosky  39   83 Thomas E Nichols  141 Russell A Poldrack  142 Tom Schonberg  143   144
Affiliations

Variability in the analysis of a single neuroimaging dataset by many teams

Rotem Botvinik-Nezer et al. Nature. 2020 Jun.

Abstract

Data analysis workflows in many scientific domains have become increasingly complex and flexible. Here we assess the effect of this flexibility on the results of functional magnetic resonance imaging by asking 70 independent teams to analyse the same dataset, testing the same 9 ex-ante hypotheses1. The flexibility of analytical approaches is exemplified by the fact that no two teams chose identical workflows to analyse the data. This flexibility resulted in sizeable variation in the results of hypothesis tests, even for teams whose statistical maps were highly correlated at intermediate stages of the analysis pipeline. Variation in reported results was related to several aspects of analysis methodology. Notably, a meta-analytical approach that aggregated information across teams yielded a significant consensus in activated regions. Furthermore, prediction markets of researchers in the field revealed an overestimation of the likelihood of significant findings, even by researchers with direct knowledge of the dataset2-5. Our findings show that analytical flexibility can have substantial effects on scientific conclusions, and identify factors that may be related to variability in the analysis of functional magnetic resonance imaging. The results emphasize the importance of validating and sharing complex analysis workflows, and demonstrate the need for performing and reporting multiple analyses of the same data. Potential approaches that could be used to mitigate issues related to analytical variability are discussed.

PubMed Disclaimer

Figures

Extended Data Figure 1 ∣
Extended Data Figure 1 ∣. Voxels overlap.
Maps showing at each voxel the proportion of teams (out of N = 65 teams) reporting significant activations in their thresholded statistical map, for each hypothesis (labeled H1 - H9), thresholded at 10% (i.e., voxels with no color were significant in fewer than 10% of teams). +/− refers to direction of effect, gain/loss refers to the effect being tested, and equal indifference (EI) / equal range (ER) refers to the group being examined or compared. Hypotheses #1 and #3, as well as hypotheses #2 and #4, share the same statistical maps as the hypotheses are for the same contrast and experimental group, but for different regions (see Extended Data Table 1). Images can be viewed at https://identifiers.org/neurovault.collection:6047
Extended Data Figure 2 ∣
Extended Data Figure 2 ∣. Variability of whole-brain unthresholded maps for hypotheses 2, 4 - 9.
For each hypothesis, we present a heatmap based on Spearman correlations between unthresholded statistical maps (N = 64), clustered according to their similarity, and the average of unthresholded images for each cluster (cluster colors in titles refer to colors in left margin of heatmap). Green / red at the columns represent binary results (significant / not significant, respectively) reported by the analysis teams; row colors represent cluster membership. Maps are thresholded at an uncorrected value of Z > 2 for visualization. Unthresholded maps for Hypothesis #2 and Hypothesis #4 are identical (as they both relate to the same contrast and group, but different regions), and the colors represent reported results for Hypothesis #2. For Hypotheses #1 and #3 see Figure 2.
Extended Data Figure 3 ∣
Extended Data Figure 3 ∣. Variability and consensus of unthresholded statistical maps (N = 64).
(a) Maps of estimated between-team variability (tau) at each voxel for each hypothesis. Images can be viewed at https://identifiers.org/neurovault.collection:6050. (b) Image-based meta-analysis (IBMA) results. A consensus analysis was performed on the unthresholded statistical maps to obtain a group statistical map for each hypothesis, accounting for the correlation between teams due to the same underlying data (see Methods). Maps are presented for each hypothesis showing voxels (in color) where the group statistic was significantly greater than zero after voxelwise correction for false discovery rate (p < 0.05). Color bar reflects statistical value (Z) for the meta-analysis. Images can be viewed at https://identifiers.org/neurovault.collection:6051. Hypotheses #1 and # 3, as well as Hypotheses #2 and #4, share the same unthresholded maps, as they relate to the same contrast and group but for different regions (see Extended Data Table 1).
Extended Data Figure 4 ∣
Extended Data Figure 4 ∣. Results of the consistent thresholding and ROI selection analysis (N = 64).
(a) Activation for each hypothesis as determined using consistent thresholding (black: p < 0.001 and cluster size > 10 voxels; blue: FDR correction with p < 0.05) and ROI selection across teams (y-axis), versus actual proportion of teams reporting activation (x-axis). Numbers next to each symbol represent the hypothesis number for each point. (b) Results from re-thresholding of unthresholded maps using uncorrected (p < 0.001, cluster size k > 10) and false discovery rate correction (pFDR < 5%) and common anatomical regions of interest for each hypothesis. A team is recorded as having an activation if one or more significant voxels are found in the ROI. Results for image-based meta-analysis (IBMA) for each hypothesis are also presented, thresholded at pFDR < 5% as well.
Extended Data Figure 5 ∣
Extended Data Figure 5 ∣. Prediction markets over time (N = 240 observations [10 days X 24 hours]).
(a). Panel regressions. The table summarizes the results of pre-registered fixed-effects panel regressions of the predictions absolute errors (i.e., the absolute deviation of the market price from the fundamental value) on an hourly basis (average price of all transactions within an hour) on time and prediction market indicators. Standard errors are computed using a robust estimator. (b) Market prices for each of the nine hypotheses separated for the team members (green) and non-team members (blue) prediction markets. The figure shows the average prediction market prices per hour separated for the two prediction markets for the time the markets were open (10 days, i.e., 240 hours). The gray line indicates the actual share of analysis teams reporting a significant result for the hypothesis (i.e., the fundamental value).
Figure 1:
Figure 1:. Fraction of teams reporting a significant result and prediction market beliefs.
The figure depicts final market prices for the “team members” (blue dots; N = 83 active traders) and the “non-team members” (green dots; N = 65 active traders) markets as well as the as well as the observed fraction of teams reporting significant results (fundamental value, pink dots; N = 70 analysis teams), and the corresponding 95% confidence intervals for each of the nine hypotheses (note that the hypotheses are sorted based on the fundamental value). Confidence intervals were constructed by assuming convergence of the binomial distribution towards the normal.
Figure 2.
Figure 2.. Analytic variability in whole-brain statistical results for Hypothesis 1.
Top panel: Spearman correlation values between whole-brain unthresholded statistical maps for each team (N = 64) were computed and clustered according to their similarity (using Ward clustering on Euclidean distances). Row colors (left) denote cluster membership, while column colors (top) represent hypothesis decisions (green: Yes, red: No). Brackets represent clustering. Bottom panel: Average statistical maps (thresholded at uncorrected z > 2.0) for each of the three clusters depicted in the left panel. The probability of reporting a positive hypothesis outcome (pYes) is presented for each cluster. Images can be viewed at https://identifiers.org/neurovault.collection:6048.

Comment in

References

    1. Botvinik-Nezer R et al. fMRI data of mixed gambles from the Neuroimaging Analysis Replication and Prediction Study. Scientific Data 6, 1–9 (2019). - PMC - PubMed
    1. Dreber A et al. Using prediction markets to estimate the reproducibility of scientific research. Proceedings of the National Academy of Sciences 112, 15343–15347 (2015). - PMC - PubMed
    1. Camerer CF et al. Evaluating replicability of laboratory experiments in economics. Science 351, 1433–1436 (2016). - PubMed
    1. Camerer CF et al. Evaluating the replicability of social science experiments in Nature and Science between 2010 and 2015. Nature Human Behaviour 2, 637–644 (2018). - PubMed
    1. Forsell E et al. Predicting replication outcomes in the Many Labs 2 study. J. Econ. Psychol 75, 102117 (2019).

Methods References

    1. Gorgolewski KJ et al. The brain imaging data structure, a format for organizing and describing outputs of neuroimaging experiments. Sci Data 3, 1–9 (2016). - PMC - PubMed
    1. Tversky A & Kahneman D Advances in Prospect Theory: Cumulative Representation of Uncertainty. J. Risk Uncertain 5, 297–323 (1992).
    1. Nichols TE et al. Best practices in data analysis and sharing in neuroimaging using MRI. Nat. Neurosci 20, 299–303 (2017). - PMC - PubMed
    1. Bates D, Mächler M, Bolker B & Walker S Fitting Linear Mixed-Effects Models Using lme4. J. Stat. Softw 67, 1–48 (2015).
    1. Lubke GH et al. Assessing Model Selection Uncertainty Using a Bootstrap Approach: An update. Struct. Equ. Modeling 24, 230–245 (2017). - PMC - PubMed

Publication types