Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Dec 15;112(50):15343-7.
doi: 10.1073/pnas.1516179112. Epub 2015 Nov 9.

Using prediction markets to estimate the reproducibility of scientific research

Affiliations

Using prediction markets to estimate the reproducibility of scientific research

Anna Dreber et al. Proc Natl Acad Sci U S A. .

Abstract

Concerns about a lack of reproducibility of statistically significant results have recently been raised in many fields, and it has been argued that this lack comes at substantial economic costs. We here report the results from prediction markets set up to quantify the reproducibility of 44 studies published in prominent psychology journals and replicated in the Reproducibility Project: Psychology. The prediction markets predict the outcomes of the replications well and outperform a survey of market participants' individual forecasts. This shows that prediction markets are a promising tool for assessing the reproducibility of published scientific results. The prediction markets also allow us to estimate probabilities for the hypotheses being true at different testing stages, which provides valuable information regarding the temporal dynamics of scientific discovery. We find that the hypotheses being tested in psychology typically have low prior probabilities of being true (median, 9%) and that a "statistically significant" finding needs to be confirmed in a well-powered replication to have a high probability of being true. We argue that prediction markets could be used to obtain speedy information about reproducibility at low cost and could potentially even be used to determine which studies to replicate to optimally allocate limited resources into replications.

Keywords: prediction markets; replications; reproducibility.

PubMed Disclaimer

Conflict of interest statement

Conflict of interest statement: Consensus Point employs B.W. and provided the online market interface used in the experiment. The market interface is commercial software.

Figures

Fig. 1.
Fig. 1.
Prediction market performance. Final market prices and survey predictions are shown for the replication of 44 publications from three top psychology journals. The prediction market predicts 29 out of 41 replications correctly, yielding better predictions than a survey carried out before the trading started. Successful replications (16 of 41 replications) are shown in black, and failed replications (25 of 41) are shown in red. Gray symbols are replications that remained unfinished (3 of 44).
Fig. 2.
Fig. 2.
Relationship between market price and prior and posterior probabilities p0, p1, and p2 of the hypothesis under investigation. Bayesian inference (green arrows) assigns an initial (prior) probability p0 to a hypothesis, indicating its plausibility in absence of a direct test. Results from an initial study allows this prior probability to be updated to posterior p1, which in turn determines the chances for the initial result to hold up in a replication, and thus the market price in the prediction market. Once the replication has been performed, the result can be used to generate posterior p2. Observing the market price, and using the statistical characteristics of the initial study and the replication, we can thus reconstruct probabilities p1, p2, and p0. Detailed calculations are presented in Supporting Information.
Fig. 3.
Fig. 3.
Probability of a hypothesis being true at three different stages of testing: before the initial study (p0), after the initial study but before the replication (p1), and after replication (p2). “Error bars” (or whiskers) represent range, boxes are first to third quartiles, and thick lines are medians. Initially, priors of the tested hypothesis are relatively low, with a median of 8.8% (range, 0.7–66%). A positive result in an initial publication then moves the prior into a broad range of intermediate levels, with a median of 56% (range, 10–97%). If replicated successfully, the probability moves further up, with a median of 98% (range, 93.0–99.2%). If the replication fails, the probability moves back to a range close to the initial prior, with a median of 6.3% (range, 0.01–80%).
Fig. S1.
Fig. S1.
Final positions per participant and market. The left panel shows the portfolios in the first set of prediction markets, and the right panel shows the portfolios for the second set of prediction markets. Long positions (bets on success) are shown in green, and short positions (bets on failure) are shown in red. This figure indicates that, in both sets of prediction markets, the participants had broad portfolios with positions in several markets. Similarly, each market attracted a number of traders. Often, traders have diverging views: in each market, there is at least one trader holding a long position, and one trader holding a short position. The final portfolios show that there are a few “bears” (predominantly betting on failure) who invested in short positions only (6 of 47 traders for the first set of markets; 4 of 45 traders for the second set of markets), and “bulls” (predominantly betting on success) who invested in long positions only (3 of 47 traders for the first set of markets; 6 of 45 traders for the second set of markets). However, most of the participants fall into a wide spectrum between these two extremes.
Fig. S2.
Fig. S2.
(A) Trading interface introductory page. When entering the prediction market, participants were presented with all hypotheses along with their current price (“score”) and recent change in price. By clicking Adjust, the participants received more information on the study and the possibility to trade by buying and selling (a). For each replication, participants were presented with the hypothesis, the authors, the title, and the journal, and could buy stocks by choosing Yes or sell stocks by choosing No (b), and enter how many points they would like to invest in the specific hypothesis (c). (B) Position summary presented participants with an overview of their investments: which hypotheses, number of shares held, and current market value.
Fig. S2.
Fig. S2.
(A) Trading interface introductory page. When entering the prediction market, participants were presented with all hypotheses along with their current price (“score”) and recent change in price. By clicking Adjust, the participants received more information on the study and the possibility to trade by buying and selling (a). For each replication, participants were presented with the hypothesis, the authors, the title, and the journal, and could buy stocks by choosing Yes or sell stocks by choosing No (b), and enter how many points they would like to invest in the specific hypothesis (c). (B) Position summary presented participants with an overview of their investments: which hypotheses, number of shares held, and current market value.
Fig. S3.
Fig. S3.
Comparison of survey responses and behavior in the two prediction markets. (A) Correlation between market price and average survey response. Market prices and average survey responses are positively correlated, suggesting that information given in the surveys was also revealed in the market (Pearson correlation coefficient of 0.78, P < 0.001, n = 43). However, market prices are more “extreme” than survey responses, which translate into a lower prediction error. Studies that were replicated successfully are shown in black, and studies that failed to replicate are shown in red. Studies that remained unfinished are shown in gray. (B) Correlation between volume of traded shares and diversity in survey responses (i.e., SD of responses; Pearson correlation coefficient of 0.51, P < 0.001, n = 43). The positive correlation between volume in the market and diversity in the surveys suggests that there was more trading for studies where participants had more diverging views on the replicability of a study. In other words, when there is larger diversity in premarket views, more trades are required to reach a “consensus” in the market pricing. (C) Negative correlation between market price and diversity in survey responses (Pearson correlation coefficient of −0.53, P < 0.001, n = 43). The diversity of survey responses is higher when the prediction market predicts a low probability that the original result will be replicated. This suggests that there is more disagreement around replications that are overall expected to fail rather than replications expected to succeed.

Comment in

  • Cracking the brain's genetic code.
    Thompson PM. Thompson PM. Proc Natl Acad Sci U S A. 2015 Dec 15;112(50):15269-70. doi: 10.1073/pnas.1520702112. Epub 2015 Nov 18. Proc Natl Acad Sci U S A. 2015. PMID: 26582794 Free PMC article. No abstract available.
  • Markets for replication.
    Brandon A, List JA. Brandon A, et al. Proc Natl Acad Sci U S A. 2015 Dec 15;112(50):15267-8. doi: 10.1073/pnas.1521417112. Epub 2015 Dec 2. Proc Natl Acad Sci U S A. 2015. PMID: 26631745 Free PMC article. No abstract available.

References

    1. Prinz F, Schlange T, Asadullah K. Believe it or not: How much can we rely on published data on potential drug targets? Nat Rev Drug Discov. 2011;10(9):712. - PubMed
    1. Begley CG, Ellis LM. Drug development: Raise standards for preclinical cancer research. Nature. 2012;483(7391):531–533. - PubMed
    1. Freedman LP, Cockburn IM, Simcoe TS. The economics of reproducibility in preclinical research. PLoS Biol. 2015;13(6):e1002165. - PMC - PubMed
    1. Button KS, et al. Power failure: Why small sample size undermines the reliability of neuroscience. Nat Rev Neurosci. 2013;14(5):365–376. - PubMed
    1. Hewitt JK. Editorial policy on candidate gene association and candidate gene-by-environment interaction studies of complex traits. Behav Genet. 2012;42(1):1–2. - PubMed

Publication types

LinkOut - more resources