Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Mar 31;8(3):200805.
doi: 10.1098/rsos.200805.

The case for formal methodology in scientific reform

Affiliations

The case for formal methodology in scientific reform

Berna Devezer et al. R Soc Open Sci. .

Abstract

Current attempts at methodological reform in sciences come in response to an overall lack of rigor in methodological and scientific practices in experimental sciences. However, most methodological reform attempts suffer from similar mistakes and over-generalizations to the ones they aim to address. We argue that this can be attributed in part to lack of formalism and first principles. Considering the costs of allowing false claims to become canonized, we argue for formal statistical rigor and scientific nuance in methodological reform. To attain this rigor and nuance, we propose a five-step formal approach for solving methodological problems. To illustrate the use and benefits of such formalism, we present a formal statistical analysis of three popular claims in the metascientific literature: (i) that reproducibility is the cornerstone of science; (ii) that data must not be used twice in any analysis; and (iii) that exploratory projects imply poor statistical practice. We show how our formal approach can inform and shape debates about such methodological claims.

Keywords: double-dipping; exploratory research; replication; reproducibility; scientific reform.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
(a) Reproducibility rate of a true result decreases with measurement error in a misspecified simple linear regression model. Reproducibility rate is estimated by the proportion of times the 95% confidence interval captures the true effect. Sample sizes are 50 (small) and 500 (large). The true regression coefficient of the predictor variable is 2 (small effect) and 20 (large effect). Model details are given in appendix D. (b) Example data (black points) generated under simple linear regression model E(Y) = 2 + 20X. Measurement and sampling error are normally distributed with standard deviations equal 3. Regression lines are fit under measurement error model (magenta line) and the correct model (blue line) with a sample size of 100. The 95% confidence interval for the regression coefficient obtained under the measurement error model is (7.94, 12.37), which does not include the true value 20. By contrast, the 95% confidence interval for the regression coefficient obtained under the correct model, (19.86, 20.21), includes the true value. For the code generating all simulations and figures in the article, please see the electronic supplementary material.
Figure 2.
Figure 2.
An example of almost perfectly reproducible false results in a misspecified simple linear regression model with measurement error. Colour map shows reproducibility rate (RR). Darkest blue cells indicate perfect reproducibility rate (almost 100%) of false results at appropriate measurement error for each false effect size, shown by its distance from the true effect size on the vertical axis. The true regression coefficient of predictor variable (effect size) is 20. Details are given in appendix D. For description of letters and arrows, refer to the text.
Figure 3.
Figure 3.
For a normally distributed variable with equal mean and variance, we randomly sample a single observation from the population. We plan to use this observation as a test statistic for the common parameter. However, prior to this test we observe the absolute value of the sample and we decide to perform the test using the information in both the observation and its absolute value, therefore, using the unsigned part twice. The plot compares power of the test based on the single observation and on the single observation conditioned on its absolute value. Conditioning improves inference by reducing the variance of the test statistic. This case corresponds to the left block, first row, second column in box 2. Lighter shades represent larger true parameter values. Technical details are given in appendix D.
Figure 4.
Figure 4.
For a two sample z-test, we display rejection regions for an unconditional test and a conditional test, setting the alternative hypothesis in the direction of the observed effect. The black curve shows the distribution of the unconditional test statistic, with the critical value given by z. The orange curve shows the distribution of the conditional test statistic, with the adjusted critical value given by z*.

References

    1. Begley CG, Ioannidis JP. 2015. Reproducibility in science: improving the standard for basic and preclinical research. Circ. Res. 116, 116-126. ( 10.1161/CIRCRESAHA.114.303819) - DOI - PubMed
    1. Donoho DL, Maleki A, Rahman IU, Shahram M, Stodden V. 2008. Reproducible research in computational harmonic analysis. Comput. Sci. Eng. 11, 8-18. ( 10.1109/MCSE.2009.15) - DOI
    1. Ioannidis JP et al. 2009. Repeatability of published microarray gene expression analyses. Nat. Genet. 41, 149. ( 10.1038/ng.295) - DOI - PubMed
    1. Open Science Collaboration. 2015. Estimating the reproducibility of psychological science. Science 349, aac4716. ( 10.1126/science.aac4716) - DOI - PubMed
    1. Ioannidis JP. 2018. Meta-research: why research on research matters. PLoS Biol. 16, e2005468. ( 10.1371/journal.pbio.2005468) - DOI - PMC - PubMed

LinkOut - more resources