Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Feb;9(2):287-304.
doi: 10.1038/s41562-024-01961-1. Epub 2024 Dec 20.

Predicting the replicability of social and behavioural science claims in COVID-19 preprints

Affiliations

Predicting the replicability of social and behavioural science claims in COVID-19 preprints

Alexandru Marcoci et al. Nat Hum Behav. 2025 Feb.

Abstract

Replications are important for assessing the reliability of published findings. However, they are costly, and it is infeasible to replicate everything. Accurate, fast, lower-cost alternatives such as eliciting predictions could accelerate assessment for rapid policy implementation in a crisis and help guide a more efficient allocation of scarce replication resources. We elicited judgements from participants on 100 claims from preprints about an emerging area of research (COVID-19 pandemic) using an interactive structured elicitation protocol, and we conducted 29 new high-powered replications. After interacting with their peers, participant groups with lower task expertise ('beginners') updated their estimates and confidence in their judgements significantly more than groups with greater task expertise ('experienced'). For experienced individuals, the average accuracy was 0.57 (95% CI: [0.53, 0.61]) after interaction, and they correctly classified 61% of claims; beginners' average accuracy was 0.58 (95% CI: [0.54, 0.62]), correctly classifying 69% of claims. The difference in accuracy between groups was not statistically significant and their judgements on the full set of claims were correlated (r(98) = 0.48, P < 0.001). These results suggest that both beginners and more-experienced participants using a structured process have some ability to make better-than-chance predictions about the reliability of 'fast science' under conditions of high uncertainty. However, given the importance of such assessments for making evidence-based critical decisions in a crisis, more research is required to understand who the right experts in forecasting replicability are and how their judgements ought to be elicited.

PubMed Disclaimer

Conflict of interest statement

Competing interests: A.M. is a UKRI Policy Fellow seconded to the Department for Science, Innovation and Technology. The views and conclusions contained herein are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Department for Science, Innovation and Technology or the UK Government. B.A.N., T.M.E., O.M., Z.L., A.H.T., B.L., N.F., E.S.P., M.K.S. and A.L.A. are or were employees of the nonprofit Center for Open Science that has a mission to increase openness, integrity and reproducibility of research. The remaining authors declare no competing interests.

Figures

Fig. 1
Fig. 1. The IDEA protocol.
The IDEA protocol as implemented on the repliCATS platform.
Fig. 2
Fig. 2. Overview of the repliCATS platform.
Overview of the repliCATS platform as displayed to participants in Round 2. The full platform view is shown in the centre, summarizing Round 1 responses from 7 participants for one of the evaluated research claims. Enlarged platform components show examples of the Round 2 elicitation questions (collapsed, a), the research claim’s statistical summary information (b), an example restatement of the claim from one of the participants in response to Q1 on the platform (c) and an example of Round 1 participant reasoning paired with their quantitative replicability judgement in response to Q3 on the platform (d).
Fig. 3
Fig. 3. Participants’ best estimates.
Smoothed distribution of participants’ best estimates for each of the 29 known-outcome research claims with ≥0.8 power with an α = 0.05, organized by type of replication (new or secondary data) and success (did or did not replicate). Experienced participants are shown in yellow and beginners in blue.
Fig. 4
Fig. 4. Predictive accuracy results.
Average error-based and classification accuracy results and 95% confidence intervals for both individuals and groups of beginners and experienced participants. Estimates and 95% confidence intervals (mean ± s.e. ×1.96) drawn from linear models described in Table 3 refit with no reference class. Statistical test results: a, Round one (Beginners: estimated effect size β^ = 0.563, 95%CI = [0.531, 0.595]; Experienced: β^ = 0.568, 95%CI = [0.535, 0.602]; Difference: t(603.01) = −0.274, P = 0.784, β^=0.005, 95%CI = [−0.043, 0.032], n = 606); Round two (Beginners: β^ = 0.577, 95%CI = [0.536, 0.617]; Experienced: β^ = 0.569, 95%CI = [0.527, 0.611]; Difference: t(591.00) = 0.431, P = 0.667, β^ = 0.008, 95%CI = [−0.028, 0.044], n = 594). b, Round one (Beginners: β^ = 0.675, 95%CI = [0.618, 0.732]; Experienced: β^ = 0.642, 95%CI = [0.594, 0.689]; Difference: t(336.00) = 0.886, P = 0.376, β^ = 0.033, 95%CI = [−0.041, 0.107], n = 338); Round two (Beginners: β^ = 0.694, 95%CI = [0.614, 0.775]; Experienced: β^=0.613, 95%CI = [0.538, 0.688]; Difference: t(326.93) = 2.131, P = 0.034, β^ = 0.081, 95%CI = [0.007, 0.156], n = 329). c, Round one (Beginners: β^=0.535 = 0.535, 95%CI = [0.482, 0.589]; Experienced: β^ = 0.569, 95%CI = [0.515, 0.622]; Difference: t(114.00) = −1.145, P = 0.255, β^=0.033, 95%CI = [−0.09, 0.024], n = 116); Round two (Beginners: β^ = 0.544, 95%CI = [0.493, 0.594]; Experienced: β^ = 0.564, 95%CI = [0.513, 0.614]; Difference: t(113.00) = −0.580, P = 0.563, β^ = −0.020, 95%CI = [−0.087, 0.047], n = 116).
Fig. 5
Fig. 5. Structured group judgements vs final market prices.
Pearson correlations between Round 2 structured group judgements (collected by the repliCATS team) and final market price for both beginners and experienced participants. Correlations are calculated with a sample size of 100, and the regression line and 95% confidence intervals are calculated using major axis regression.
Fig. 6
Fig. 6. Participants’ best estimates and interval widths.
Average best estimates and interval widths for both beginners and experienced participants. Estimates and 95% confidence intervals (mean ± s.e. ×1.96) drawn from linear models described in Table 3 refit with no reference class. Statistical test results: a, Round one [Beginners: β^ = 0.632, 95%CI = [0.614, 0.65]; Experienced: β^ = 0.594, 95%CI = [0.576, 0.612]; Difference: t(1981.3591) = 4.596, P < 0.0001, β^ = 0.038, s.e. = 0.008, 95%CI = [0.022, 0.054], n = 2080]; Round two [Beginners: β^ = 0.642, 95%CI = [0.622, 0.662]; Experienced: β^ = 0.587, 95%CI = [0.567, 0.607]; Difference: t(1980.0922) = 8.008, P < 0.0001, β^ = 0.055, s.e. = 0.007, 95%CI = [0.041, 0.068], n = 2080]. b, Round one [Beginners: β^ = 0.309, 95%CI = [0.298, 0.319]; Experienced: β^ = 0.317, 95%CI = [0.306, 0.328]; Difference: t(1988.3245) = −1.044, P = 0.297, β^ = −0.008, s.e. = 0.008, 95%CI = [−0.023, 0.007], n = 2080]; Round two [Beginners: β^ = 0.289, 95%CI = [0.279, 0.298]; Experienced: β^ = 0.318, 95%CI = [0.308, 0.327]; Difference: t(1985.2034) = −4.882, P < 0.0001, β^ = −0.029, s.e. = 0.006, 95%CI = [−0.041, −0.017], n = 2080].

References

    1. Begley, C. G. & Ellis, L. M. Drug development: raise standards for preclinical cancer research. Nature483, 531–533 (2012). - PubMed
    1. Errington, T. M. et al. Investigating the replicability of preclinical cancer biology. Elife10, e71601 (2021). - PMC - PubMed
    1. Klein, R. A. et al. Investigating variation in replicability. Soc. Psychol.45, 142–152 (2014).
    1. Open Science Collaboration. Estimating the reproducibility of psychological science. Science349, aac4716 (2015). - PubMed
    1. Liang, H. & Fu, K. W. Testing propositions derived from Twitter studies: generalization and replication in computational social science. PLoS ONE10, e0134270 (2015). - PMC - PubMed

LinkOut - more resources