The logical structure of experiments lays the foundation for a theory of reproducibility

Erkan O Buzbas¹, Berna Devezer^{1

2}, Bert Baumgaertner³

Affiliations

¹ Department of Mathematics and Statistical Science, University of Idaho, Moscow, ID 83844, USA.
² Department of Business, University of Idaho, Moscow, ID 83844, USA.
³ Department of Politics and Philosophy, University of Idaho, Moscow, ID 83844, USA.

PMID: 36938532
PMCID: PMC10014247
DOI: 10.1098/rsos.221042

The logical structure of experiments lays the foundation for a theory of reproducibility

Erkan O Buzbas et al. R Soc Open Sci. 2023.

. 2023 Mar 15;10(3):221042.

doi: 10.1098/rsos.221042. eCollection 2023 Mar.

Authors

Erkan O Buzbas¹, Berna Devezer^{1

2}, Bert Baumgaertner³

Affiliations

¹ Department of Mathematics and Statistical Science, University of Idaho, Moscow, ID 83844, USA.
² Department of Business, University of Idaho, Moscow, ID 83844, USA.
³ Department of Politics and Philosophy, University of Idaho, Moscow, ID 83844, USA.

PMID: 36938532
PMCID: PMC10014247
DOI: 10.1098/rsos.221042

Abstract

The scientific reform movement has proposed openness as a potential remedy to the putative reproducibility or replication crisis. However, the conceptual relationship among openness, replication experiments and results reproducibility has been obscure. We analyse the logical structure of experiments, define the mathematical notion of idealized experiment and use this notion to advance a theory of reproducibility. Idealized experiments clearly delineate the concepts of replication and results reproducibility, and capture key differences with precision, allowing us to study the relationship among them. We show how results reproducibility varies as a function of the elements of an idealized experiment, the true data-generating mechanism, and the closeness of the replication experiment to an original experiment. We clarify how openness of experiments is related to designing informative replication experiments and to obtaining reproducible results. With formal backing and evidence, we argue that the current 'crisis' reflects inadequate attention to a theoretical understanding of results reproducibility.

Keywords: experiment; metascience; open science; replication; reproducibility; statistical theory.

PubMed Disclaimer

Conflict of interest statement

We declare we have no competing interests.

Figures

**Figure 1.**
Six idealized experiments ξ_bin, ξ_negbin, ξ_hyper, ξ_poi, ξ_exp, ξ_nor: The binomial, negative binomial, hypergeometric, Poisson approximation to binomial, exponential waiting times between Poisson events and normal approximation to binomial, respectively. All but ξ_hyper assume infinite population (A) of black and white ravens, with sampling designs resulting in distinct probability models (M_A). ξ_hyper assumes sampling from a finite subset of the population. All experiments aim at performing inference on result (R), which reduces down to an estimate of either the population proportion of black ravens or the mean number of black ravens in the population.

**Figure 2.**
For the models in the toy example, degrees of openness (as given by definition 2.6) are depicted in eight networks, each consisting of the same 24 idealized experiments. Each idealized experiment is represented by a node in each network. These 24 experiments are obtained by a 6 × 2 × 2 factorial design. The first factor, M_A, takes six values: binomial, negative binomial, hypergeometric, Poisson, exponential and normal. The second factor, S_post, takes two values: MLE and posterior mode. The third factor, D_s, takes two values: n = 30 and n = 200. Connections between nodes represent potential substitutions of non-open elements of idealized experiments. As more elements of an idealized experiment are non-open, the probability of choosing an exact replication decreases, as indicated by increased connectivity in the network.

**Figure 3.**
Reproducibility rates of a true result in sequences of 1000 exact (*a,b*) and non-exact (c) replication experiments. S_post is varied as MLE and posterior mode. *D_s* is varied as n = 30 and n = 200. Each condition is colour coded and consists of 100 independent runs. (a) *M_A*: Poisson. Orange; MLE, n = 200. Purple; posterior mode, n = 200. Light green; MLE, n = 30. Light blue; posterior mode, n = 30. (b) *M_A*: Normal. Dark green; posterior mode, n = 200. Dark blue; MLE, n = 200. Pale blue; posterior mode, n = 30. Rose; MLE, n = 30. (c) Three cases of 1000 non-exact replication experiments where they are chosen uniformly randomly from the set of all eight idealized experiments (magenta), four idealized experiments with lowest reproducibility rates (aqua blue), and four idealized experiments with highest reproducibility rates (yellow). (a–c) Asterisks denote the mean of the reproducibility rates of 100 runs at step 1000, an estimate of the true reproducibility rate for the sequence of idealized experiments. (d) Variances of all 11 exact and non-exact sequences at step 50 of the simulation with respect to the estimated reproducibility rate (see text for interpretation).

**Figure 4.**
(a) Empirical cumulative distribution function (ECDF) of a sample of size 30, emphasizing that the ECDF is a right continuous function. (b) ECDF of the sample in (a) (black) and that of an independent sample of size 10 (red) emphasizing that the ECDF is a random variable whose probability distribution is determined by the sample values (and hence data-generating mechanism). (c) One hundred independent samples of varying sample size (grey) emphasizing that ECDF is a stochastic process. Red vertical line shows the distribution of ECDF conditional on value x*.

**Figure 5.**
Epistemic versus in-principle reproducibility with an example of Bayesian information flow and learning (details are provide within appendix text).

**Figure 6.**
A simulation example to illustrate the convergence of reproducibility rates from exact and non-exact replication experiments to their true value. See text within the appendix for description of this figure.

See this image and copyright information in PMC

References

1. Open Science Collaboration. 2015. Estimating the reproducibility of psychological science. Science 349, aac4716–1-aac4716–8. - PubMed
1. Leonelli S. 2018. Rethinking reproducibility as a criterion for research quality. In Including a symposium on Mary Morgan: curiosity, imagination, and surprise, vol. 36B, pp. 129–146. Bingley, UK: Emerald Publishing Limited.
1. Radder H. 1992. Experimental reproducibility and the experimenters’ regress. In PSA: Proc. of the Biennial Meeting of the Philosophy of Science Association, vol. 1992, pp. 63–73. Philosophy of Science Association.
1. Radder H. 1996. In and about the world: philosophical studies of science and technology. Albany, NY: SUNY Press.
1. Fidler F, Wilcox J. 2018. Reproducibility of scientific results. In The Stanford encyclopedia of philosophy (ed. EN Zalta). Stanford, CA: Metaphysics Research Lab, Stanford University.

Grants and funding

P20 GM104420/GM/NIGMS NIH HHS/United States

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

The logical structure of experiments lays the foundation for a theory of reproducibility

Affiliations

The logical structure of experiments lays the foundation for a theory of reproducibility

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Grants and funding

LinkOut - more resources

Full Text Sources