Data availability, reusability, and analytic reproducibility: evaluating the impact of a mandatory open data policy at the journal Cognition

Tom E Hardwicke¹, Maya B Mathur^{2

3}, Kyle MacDonald⁴, Gustav Nilsonne^{4

5

6}, George C Banks⁷, Mallory C Kidwell⁸, Alicia Hofelich Mohr⁹, Elizabeth Clayton¹⁰, Erica J Yoon⁴, Michael Henry Tessler⁴, Richie L Lenne¹¹, Sara Altman⁴, Bria Long⁴, Michael C Frank⁴

Affiliations

¹ Meta-Research Innovation Center at Stanford (METRICS), Stanford University, Palo Alto, CA, USA.
² Quantitative Sciences Unit, Stanford University, Palo Alto, CA, USA.
³ Harvard Biostatistics, Harvard University, Cambridge, MA, USA.
⁴ Department of Psychology, Stanford University, Palo Alto, CA, USA.
⁵ Stress Research Institute, Stockholm University, Stockholm, Sweden.
⁶ Department of Clinical Neuroscience, Karolinska Institutet, Stockholm, Sweden.
⁷ Belk College of Business, University of North Carolina at Charlotte, Charlotte, NC, USA.
⁸ Department of Psychology, University of Utah, Salt Lake City, UT, USA.
⁹ Liberal Arts Technologies and Innovated Services (LATIS), University of Minnesota, Minneapolis, MN, USA.
¹⁰ The Organizational Science Program, University of North Carolina at Charlotte, Charlotte, NC, USA.
¹¹ Department of Psychology, University of Minnesota, Minneapolis, MN, USA.

PMID: 30225032
PMCID: PMC6124055
DOI: 10.1098/rsos.180448

Data availability, reusability, and analytic reproducibility: evaluating the impact of a mandatory open data policy at the journal Cognition

Tom E Hardwicke et al. R Soc Open Sci. 2018.

. 2018 Aug 15;5(8):180448.

doi: 10.1098/rsos.180448. eCollection 2018 Aug.

Authors

Affiliations

¹ Meta-Research Innovation Center at Stanford (METRICS), Stanford University, Palo Alto, CA, USA.
² Quantitative Sciences Unit, Stanford University, Palo Alto, CA, USA.
³ Harvard Biostatistics, Harvard University, Cambridge, MA, USA.
⁴ Department of Psychology, Stanford University, Palo Alto, CA, USA.
⁵ Stress Research Institute, Stockholm University, Stockholm, Sweden.
⁶ Department of Clinical Neuroscience, Karolinska Institutet, Stockholm, Sweden.
⁷ Belk College of Business, University of North Carolina at Charlotte, Charlotte, NC, USA.
⁸ Department of Psychology, University of Utah, Salt Lake City, UT, USA.
⁹ Liberal Arts Technologies and Innovated Services (LATIS), University of Minnesota, Minneapolis, MN, USA.
¹⁰ The Organizational Science Program, University of North Carolina at Charlotte, Charlotte, NC, USA.
¹¹ Department of Psychology, University of Minnesota, Minneapolis, MN, USA.

PMID: 30225032
PMCID: PMC6124055
DOI: 10.1098/rsos.180448

Abstract

Access to data is a critical feature of an efficient, progressive and ultimately self-correcting scientific ecosystem. But the extent to which in-principle benefits of data sharing are realized in practice is unclear. Crucially, it is largely unknown whether published findings can be reproduced by repeating reported analyses upon shared data ('analytic reproducibility'). To investigate this, we conducted an observational evaluation of a mandatory open data policy introduced at the journal Cognition. Interrupted time-series analyses indicated a substantial post-policy increase in data available statements (104/417, 25% pre-policy to 136/174, 78% post-policy), although not all data appeared reusable (23/104, 22% pre-policy to 85/136, 62%, post-policy). For 35 of the articles determined to have reusable data, we attempted to reproduce 1324 target values. Ultimately, 64 values could not be reproduced within a 10% margin of error. For 22 articles all target values were reproduced, but 11 of these required author assistance. For 13 articles at least one value could not be reproduced despite author assistance. Importantly, there were no clear indications that original conclusions were seriously impacted. Mandatory open data policies can increase the frequency and quality of data sharing. However, suboptimal data curation, unclear analysis specification and reporting errors can impede analytic reproducibility, undermining the utility of data sharing and the credibility of scientific findings.

Keywords: interrupted time series; journal policy; meta-science; open data; open science; reproducibility.

PubMed Disclaimer

Conflict of interest statement

M.C.F. was an Associate Editor at the journal Cognition during the study. The other authors have no competing interests.

Figures

**Figure 1.**
Proportion of articles with data available statements as a function of submission date across the assessment period. For ease of presentation, circles indicate proportions in 50-day bins with the circle area representing the total number of articles in each bin (but note that the analysis model was fitted to individual articles). Solid red lines represent predictions of an interrupted time-series analysis segmented by pre-policy and post-policy periods. The dashed red line estimates, based on the pre-policy period, the trajectory of data available statement inclusion if the policy had no effect. The model is linear on the logit scale, whereas the y-axis of the figure is on the probability scale, which is a nonlinear transformation of the logit. Confidence bands (red) indicate 95% CIs. Note that the small article numbers in the extremes of the graph are due to long submission-to-publication lag times. Our sample selection was based on the publication date, but it is the submission date which determines whether an article falls within the pre-policy or post-policy period.

**Figure 2.**
Counts and percentages for articles in the pre- and post-policy periods with available statements, accessible data, complete data and understandable data. Only accessible, complete and understandable data are considered ‘reusable in principle’. Arrow size represents the proportion of total articles.

**Figure 3.**
All 1324 values were checked for reproducibility as a function of article and value type (n = count/proportion; ci = confidence interval; misc = miscellaneous; M = mean/median; df = degrees of freedom; es = effect size; test = test statistic; p = p-value; sd/se = standard deviation/standard error. Bold red X marks indicate non-reproducible values (major errors) and grey circles indicate reproducible values. Symbol size represents the number of values. Both axes are ordered by an increasing number of errors towards the graph origin. The article colours represent the overall outcome: not fully reproducible despite author assistance (red), reproducible with author assistance (orange) and reproducible without author assistance (green). For articles marked within asterisks (*), the analysis could not be completed and there was insufficient information to determine whether original conclusions were affected. In all other cases, it is unlikely that original conclusions were affected.

**Figure 4.**
Locus of non-reproducibility based on discrete issues identified in each article. Circles indicate reproducibility issues resolved through author assistance, and X marks indicate unresolved reproducibility issues. Symbol size represents the number of discrete reproducibility issues. Left panel represents articles that were not fully reproducible despite author assistance (some issues may have been resolved but others remain). Right panel represents articles that were reproducible with author assistance (all issues were resolved). Both axes are ordered by an increasing number of discrete reproducibility issues towards the origin.

See this image and copyright information in PMC

References

1. Ioannidis JPA. 2014. How to make more published research true. PLoS Med. 11, e1001747–6 (10.1371/journal.pmed.1001747) - DOI - PMC - PubMed
1. Munafò MR, et al. 2017. A manifesto for reproducible science. Nat. Hum. Behav. 1, 1–9. (10.1038/s41562-016-0021) - DOI - PMC - PubMed
1. Nosek BA, et al. 2015. Promoting an open research culture. Science 348, 1422–1425. (10.1126/science.aab2374) - DOI - PMC - PubMed
1. Goodman SN, Fanelli D, Ioannidis JPA. 2016. What does research reproducibility mean? Sci. Transl. Med. 8, 1–6. (10.1126/scitranslmed.aaf5027) - DOI - PubMed
1. Stodden V, McNutt M, Bailey DH, Deelman E, Gil Y, Hanson B, Heroux MA, Ioannidis JPA, Taufer M. 2016. Enhancing reproducibility for computational methods. Science 354, 1240–1241. (10.1126/science.aah6168) - DOI - PubMed

Associated data

figshare/10.6084/m9.figshare.c.4175039

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Data availability, reusability, and analytic reproducibility: evaluating the impact of a mandatory open data policy at the journal Cognition

Affiliations

Data availability, reusability, and analytic reproducibility: evaluating the impact of a mandatory open data policy at the journal Cognition

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Associated data

LinkOut - more resources

Full Text Sources

Other Literature Sources