Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Mar 15;10(3):221042.
doi: 10.1098/rsos.221042. eCollection 2023 Mar.

The logical structure of experiments lays the foundation for a theory of reproducibility

Affiliations

The logical structure of experiments lays the foundation for a theory of reproducibility

Erkan O Buzbas et al. R Soc Open Sci. .

Abstract

The scientific reform movement has proposed openness as a potential remedy to the putative reproducibility or replication crisis. However, the conceptual relationship among openness, replication experiments and results reproducibility has been obscure. We analyse the logical structure of experiments, define the mathematical notion of idealized experiment and use this notion to advance a theory of reproducibility. Idealized experiments clearly delineate the concepts of replication and results reproducibility, and capture key differences with precision, allowing us to study the relationship among them. We show how results reproducibility varies as a function of the elements of an idealized experiment, the true data-generating mechanism, and the closeness of the replication experiment to an original experiment. We clarify how openness of experiments is related to designing informative replication experiments and to obtaining reproducible results. With formal backing and evidence, we argue that the current 'crisis' reflects inadequate attention to a theoretical understanding of results reproducibility.

Keywords: experiment; metascience; open science; replication; reproducibility; statistical theory.

PubMed Disclaimer

Conflict of interest statement

We declare we have no competing interests.

Figures

Figure 1.
Figure 1.
Six idealized experiments ξbin, ξnegbin, ξhyper, ξpoi, ξexp, ξnor: The binomial, negative binomial, hypergeometric, Poisson approximation to binomial, exponential waiting times between Poisson events and normal approximation to binomial, respectively. All but ξhyper assume infinite population (A) of black and white ravens, with sampling designs resulting in distinct probability models (MA). ξhyper assumes sampling from a finite subset of the population. All experiments aim at performing inference on result (R), which reduces down to an estimate of either the population proportion of black ravens or the mean number of black ravens in the population.
Figure 2.
Figure 2.
For the models in the toy example, degrees of openness (as given by definition 2.6) are depicted in eight networks, each consisting of the same 24 idealized experiments. Each idealized experiment is represented by a node in each network. These 24 experiments are obtained by a 6 × 2 × 2 factorial design. The first factor, MA, takes six values: binomial, negative binomial, hypergeometric, Poisson, exponential and normal. The second factor, Spost, takes two values: MLE and posterior mode. The third factor, Ds, takes two values: n = 30 and n = 200. Connections between nodes represent potential substitutions of non-open elements of idealized experiments. As more elements of an idealized experiment are non-open, the probability of choosing an exact replication decreases, as indicated by increased connectivity in the network.
Figure 3.
Figure 3.
Reproducibility rates of a true result in sequences of 1000 exact (a,b) and non-exact (c) replication experiments. Spost is varied as MLE and posterior mode. Ds is varied as n = 30 and n = 200. Each condition is colour coded and consists of 100 independent runs. (a) MA: Poisson. Orange; MLE, n = 200. Purple; posterior mode, n = 200. Light green; MLE, n = 30. Light blue; posterior mode, n = 30. (b) MA: Normal. Dark green; posterior mode, n = 200. Dark blue; MLE, n = 200. Pale blue; posterior mode, n = 30. Rose; MLE, n = 30. (c) Three cases of 1000 non-exact replication experiments where they are chosen uniformly randomly from the set of all eight idealized experiments (magenta), four idealized experiments with lowest reproducibility rates (aqua blue), and four idealized experiments with highest reproducibility rates (yellow). (ac) Asterisks denote the mean of the reproducibility rates of 100 runs at step 1000, an estimate of the true reproducibility rate for the sequence of idealized experiments. (d) Variances of all 11 exact and non-exact sequences at step 50 of the simulation with respect to the estimated reproducibility rate (see text for interpretation).
Figure 4.
Figure 4.
(a) Empirical cumulative distribution function (ECDF) of a sample of size 30, emphasizing that the ECDF is a right continuous function. (b) ECDF of the sample in (a) (black) and that of an independent sample of size 10 (red) emphasizing that the ECDF is a random variable whose probability distribution is determined by the sample values (and hence data-generating mechanism). (c) One hundred independent samples of varying sample size (grey) emphasizing that ECDF is a stochastic process. Red vertical line shows the distribution of ECDF conditional on value x*.
Figure 5.
Figure 5.
Epistemic versus in-principle reproducibility with an example of Bayesian information flow and learning (details are provide within appendix text).
Figure 6.
Figure 6.
A simulation example to illustrate the convergence of reproducibility rates from exact and non-exact replication experiments to their true value. See text within the appendix for description of this figure.

References

    1. Open Science Collaboration. 2015. Estimating the reproducibility of psychological science. Science 349, aac4716–1-aac4716–8. - PubMed
    1. Leonelli S. 2018. Rethinking reproducibility as a criterion for research quality. In Including a symposium on Mary Morgan: curiosity, imagination, and surprise, vol. 36B, pp. 129–146. Bingley, UK: Emerald Publishing Limited.
    1. Radder H. 1992. Experimental reproducibility and the experimenters’ regress. In PSA: Proc. of the Biennial Meeting of the Philosophy of Science Association, vol. 1992, pp. 63–73. Philosophy of Science Association.
    1. Radder H. 1996. In and about the world: philosophical studies of science and technology. Albany, NY: SUNY Press.
    1. Fidler F, Wilcox J. 2018. Reproducibility of scientific results. In The Stanford encyclopedia of philosophy (ed. EN Zalta). Stanford, CA: Metaphysics Research Lab, Stanford University.

LinkOut - more resources