. 2015 Dec 15;112(50):15343-7.

doi: 10.1073/pnas.1516179112. Epub 2015 Nov 9.

Using prediction markets to estimate the reproducibility of scientific research

Anna Dreber¹, Thomas Pfeiffer², Johan Almenberg³, Siri Isaksson⁴, Brad Wilson⁵, Yiling Chen⁶, Brian A Nosek⁷, Magnus Johannesson⁴

Affiliations

¹ Department of Economics, Stockholm School of Economics, SE-113 83 Stockholm, Sweden; anna.dreber@hhs.se.
² New Zealand Institute for Advanced Study, Massey University, Auckland 0745, New Zealand; Wissenschaftskolleg zu Berlin-Institute for Advanced Study, D-14193 Berlin, Germany;
³ Sveriges Riksbank, SE-103 37 Stockholm, Sweden;
⁴ Department of Economics, Stockholm School of Economics, SE-113 83 Stockholm, Sweden;
⁵ Consensus Point, Nashville, TN 37203;
⁶ John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, MA 02138;
⁷ Department of Psychology, University of Virginia, Charlottesville, VA 22904; Center for Open Science, Charlottesville, VA 22903.

PMID: 26553988
PMCID: PMC4687569
DOI: 10.1073/pnas.1516179112

Using prediction markets to estimate the reproducibility of scientific research

Anna Dreber et al. Proc Natl Acad Sci U S A. 2015.

. 2015 Dec 15;112(50):15343-7.

doi: 10.1073/pnas.1516179112. Epub 2015 Nov 9.

Authors

Anna Dreber¹, Thomas Pfeiffer², Johan Almenberg³, Siri Isaksson⁴, Brad Wilson⁵, Yiling Chen⁶, Brian A Nosek⁷, Magnus Johannesson⁴

Affiliations

¹ Department of Economics, Stockholm School of Economics, SE-113 83 Stockholm, Sweden; anna.dreber@hhs.se.
² New Zealand Institute for Advanced Study, Massey University, Auckland 0745, New Zealand; Wissenschaftskolleg zu Berlin-Institute for Advanced Study, D-14193 Berlin, Germany;
³ Sveriges Riksbank, SE-103 37 Stockholm, Sweden;
⁴ Department of Economics, Stockholm School of Economics, SE-113 83 Stockholm, Sweden;
⁵ Consensus Point, Nashville, TN 37203;
⁶ John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, MA 02138;
⁷ Department of Psychology, University of Virginia, Charlottesville, VA 22904; Center for Open Science, Charlottesville, VA 22903.

PMID: 26553988
PMCID: PMC4687569
DOI: 10.1073/pnas.1516179112

Abstract

Concerns about a lack of reproducibility of statistically significant results have recently been raised in many fields, and it has been argued that this lack comes at substantial economic costs. We here report the results from prediction markets set up to quantify the reproducibility of 44 studies published in prominent psychology journals and replicated in the Reproducibility Project: Psychology. The prediction markets predict the outcomes of the replications well and outperform a survey of market participants' individual forecasts. This shows that prediction markets are a promising tool for assessing the reproducibility of published scientific results. The prediction markets also allow us to estimate probabilities for the hypotheses being true at different testing stages, which provides valuable information regarding the temporal dynamics of scientific discovery. We find that the hypotheses being tested in psychology typically have low prior probabilities of being true (median, 9%) and that a "statistically significant" finding needs to be confirmed in a well-powered replication to have a high probability of being true. We argue that prediction markets could be used to obtain speedy information about reproducibility at low cost and could potentially even be used to determine which studies to replicate to optimally allocate limited resources into replications.

Keywords: prediction markets; replications; reproducibility.

PubMed Disclaimer

Conflict of interest statement

Conflict of interest statement: Consensus Point employs B.W. and provided the online market interface used in the experiment. The market interface is commercial software.

Figures

**Fig. 1.**
Prediction market performance. Final market prices and survey predictions are shown for the replication of 44 publications from three top psychology journals. The prediction market predicts 29 out of 41 replications correctly, yielding better predictions than a survey carried out before the trading started. Successful replications (16 of 41 replications) are shown in black, and failed replications (25 of 41) are shown in red. Gray symbols are replications that remained unfinished (3 of 44).

**Fig. 2.**
Relationship between market price and prior and posterior probabilities p₀, p₁, and p₂ of the hypothesis under investigation. Bayesian inference (green arrows) assigns an initial (prior) probability p₀ to a hypothesis, indicating its plausibility in absence of a direct test. Results from an initial study allows this prior probability to be updated to posterior p₁, which in turn determines the chances for the initial result to hold up in a replication, and thus the market price in the prediction market. Once the replication has been performed, the result can be used to generate posterior p₂. Observing the market price, and using the statistical characteristics of the initial study and the replication, we can thus reconstruct probabilities p₁, p₂, and p₀. Detailed calculations are presented in *Supporting Information*.

**Fig. 3.**
Probability of a hypothesis being true at three different stages of testing: before the initial study (p₀), after the initial study but before the replication (p₁), and after replication (p₂). “Error bars” (or whiskers) represent range, boxes are first to third quartiles, and thick lines are medians. Initially, priors of the tested hypothesis are relatively low, with a median of 8.8% (range, 0.7–66%). A positive result in an initial publication then moves the prior into a broad range of intermediate levels, with a median of 56% (range, 10–97%). If replicated successfully, the probability moves further up, with a median of 98% (range, 93.0–99.2%). If the replication fails, the probability moves back to a range close to the initial prior, with a median of 6.3% (range, 0.01–80%).

**Fig. S1.**
Final positions per participant and market. The left panel shows the portfolios in the first set of prediction markets, and the right panel shows the portfolios for the second set of prediction markets. Long positions (bets on success) are shown in green, and short positions (bets on failure) are shown in red. This figure indicates that, in both sets of prediction markets, the participants had broad portfolios with positions in several markets. Similarly, each market attracted a number of traders. Often, traders have diverging views: in each market, there is at least one trader holding a long position, and one trader holding a short position. The final portfolios show that there are a few “bears” (predominantly betting on failure) who invested in short positions only (6 of 47 traders for the first set of markets; 4 of 45 traders for the second set of markets), and “bulls” (predominantly betting on success) who invested in long positions only (3 of 47 traders for the first set of markets; 6 of 45 traders for the second set of markets). However, most of the participants fall into a wide spectrum between these two extremes.

**Fig. S2.**
(A) Trading interface introductory page. When entering the prediction market, participants were presented with all hypotheses along with their current price (“score”) and recent change in price. By clicking Adjust, the participants received more information on the study and the possibility to trade by buying and selling (a). For each replication, participants were presented with the hypothesis, the authors, the title, and the journal, and could buy stocks by choosing Yes or sell stocks by choosing No (b), and enter how many points they would like to invest in the specific hypothesis (c). (B) Position summary presented participants with an overview of their investments: which hypotheses, number of shares held, and current market value.

**Fig. S3.**
Comparison of survey responses and behavior in the two prediction markets. (A) Correlation between market price and average survey response. Market prices and average survey responses are positively correlated, suggesting that information given in the surveys was also revealed in the market (Pearson correlation coefficient of 0.78, P < 0.001, n = 43). However, market prices are more “extreme” than survey responses, which translate into a lower prediction error. Studies that were replicated successfully are shown in black, and studies that failed to replicate are shown in red. Studies that remained unfinished are shown in gray. (B) Correlation between volume of traded shares and diversity in survey responses (i.e., SD of responses; Pearson correlation coefficient of 0.51, P < 0.001, n = 43). The positive correlation between volume in the market and diversity in the surveys suggests that there was more trading for studies where participants had more diverging views on the replicability of a study. In other words, when there is larger diversity in premarket views, more trades are required to reach a “consensus” in the market pricing. (C) Negative correlation between market price and diversity in survey responses (Pearson correlation coefficient of −0.53, P < 0.001, n = 43). The diversity of survey responses is higher when the prediction market predicts a low probability that the original result will be replicated. This suggests that there is more disagreement around replications that are overall expected to fail rather than replications expected to succeed.

See this image and copyright information in PMC

Comment in

Cracking the brain's genetic code.
Thompson PM. Thompson PM. Proc Natl Acad Sci U S A. 2015 Dec 15;112(50):15269-70. doi: 10.1073/pnas.1520702112. Epub 2015 Nov 18. Proc Natl Acad Sci U S A. 2015. PMID: 26582794 Free PMC article. No abstract available.
Markets for replication.
Brandon A, List JA. Brandon A, et al. Proc Natl Acad Sci U S A. 2015 Dec 15;112(50):15267-8. doi: 10.1073/pnas.1521417112. Epub 2015 Dec 2. Proc Natl Acad Sci U S A. 2015. PMID: 26631745 Free PMC article. No abstract available.

References

1. Prinz F, Schlange T, Asadullah K. Believe it or not: How much can we rely on published data on potential drug targets? Nat Rev Drug Discov. 2011;10(9):712. - PubMed
1. Begley CG, Ellis LM. Drug development: Raise standards for preclinical cancer research. Nature. 2012;483(7391):531–533. - PubMed
1. Freedman LP, Cockburn IM, Simcoe TS. The economics of reproducibility in preclinical research. PLoS Biol. 2015;13(6):e1002165. - PMC - PubMed
1. Button KS, et al. Power failure: Why small sample size undermines the reliability of neuroscience. Nat Rev Neurosci. 2013;14(5):365–376. - PubMed
1. Hewitt JK. Editorial policy on candidate gene association and candidate gene-by-environment interaction studies of complex traits. Behav Genet. 2012;42(1):1–2. - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Using prediction markets to estimate the reproducibility of scientific research

Affiliations

Using prediction markets to estimate the reproducibility of scientific research

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Comment in

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources