Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Apr 22;23(4):e3003019.
doi: 10.1371/journal.pbio.3003019. eCollection 2025 Apr.

Testing the reproducibility of ecological studies on insect behavior in a multi-laboratory setting identifies opportunities for improving experimental rigor

Affiliations

Testing the reproducibility of ecological studies on insect behavior in a multi-laboratory setting identifies opportunities for improving experimental rigor

Carolin Mundinger et al. PLoS Biol. .

Abstract

The reproducibility of studies involving insect species is an underexplored area in the broader discussion about poor reproducibility in science. Our study addresses this gap by conducting a systematic multi-laboratory investigation into the reproducibility of ecological studies on insect behavior. We implemented a 3 × 3 experimental design, incorporating three study sites, and three independent experiments on three insect species from different orders: the turnip sawfly (Athalia rosae, Hymenoptera), the meadow grasshopper (Pseudochorthippus parallelus, Orthoptera), and the red flour beetle (Tribolium castaneum, Coleoptera). Using random-effect meta-analysis, we compared the consistency and accuracy of treatment effects on insect behavioral traits across replicate experiments. We successfully reproduced the overall statistical treatment effect in 83% of the replicate experiments, but overall effect size replication was achieved in only 66% of the replicates. Thus, though demonstrating sufficient reproducibility in some measures, this study also provides the first experimental evidence for cases of poor reproducibility in insect experiments. Our findings further show that reasons causing poor reproducibility established in rodent research also hold for other study organisms and research questions. We believe that a rethinking of current best practices is required to face reproducibility issues in insect studies but also across disciplines. Specifically, we advocate for adopting open research practices and the implementation of methodological strategies that reduce bias and problems arising from over-standardization. With respect to the latter, the introduction of systematic variation through multi-laboratory or heterogenized designs may contribute to improved reproducibility in studies involving any living organisms.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Overview of the 3 × 3 design with the participating laboratories, study species, and research questions of the respective experiments.
PCI: post-contact immobility. Photo credit: Athalia: Pragya Singh; Pseudochorthippus: Holger Schielzeth; Tribolium: Tobias Prueser.
Fig 2
Fig 2. Effect of treatment (control or starvation) on the post-contact immobility (PCI) duration (A) across laboratories and (B) within laboratories and on the distance moved (C) across laboratories and (D) within laboratories in A. rosae. Data are presented as box plots showing medians, 25% and 75% percentiles (lower and upper box), and 5% and 95% percentiles (lower and upper line). Statistics: Wilcoxon signed-rank test, two-sided, *p ≤ 0.05, **p ≤ 0.01, ***p ≤ 0.001. Model results from the GLMM on the transformed data show the same direction of significant differences (see Table 2). Raw data and code needed to reproduce this Figure can be found in https://zenodo.org/records/14002690, the summary data presented in the Figure is additionally listed in Table 2.
Fig 3
Fig 3. Variation of treatment differences across the three replicate experiments for (A) the post-contact immobility (PCI) duration and (B) distance moved in larvae of A. rosae, (C) the substrate preference in the two color morphs of P. parallelus, and (D) the niche preference in the two life stages of T. castaneum. The black solid line reflects the null effect. The red dashed line and gray area indicate the overall absolute (abs) effect size and its corresponding 95% confidence interval (CI95). Dots and vertical dashed lines reflect the mean group differences (as estimated from the GLMMs) and corresponding CI95 of the three laboratories. The overall absolute effect size was estimated by a random-effect meta-analysis based on the individual treatment effect sizes and standard errors of all laboratories. Raw data and code needed to reproduce this Figure can be found in https://zenodo.org/records/14002690, the summary data presented in the Figure is listed in the S4A–D Table.
Fig 4
Fig 4. Preference for the green background out of the two presented backgrounds (green and brown) in the different color morphs of the meadow grasshopper P. parallelus (A) across all laboratories and (B) within each laboratory. Plotted is the preference in percent to sit on the green background, calculated from all observations (8,784 positions across 185 individual grasshoppers). Data are presented as box plots showing medians, 25% and 75% percentiles (lower and upper box), and 5% and 95% percentiles (lower and upper line). Statistics: Wilcoxon signed-rank test, two-sided. Model results from the GLMM on the transformed data show the same direction of significant differences (see Table 2). Raw data and code needed to reproduce this Figure can be found in https://zenodo.org/records/14002690, the summary data presented in the Figure is listed in the S5A and S5B Table.
Fig 5
Fig 5. Effect of treatment (life stage larvae vs. adult) on the flour preference in the red flour beetle T. castaneum (A) across laboratories and (B) within laboratories. Data are presented as boxplots showing medians, 25% and 75% percentiles (lower and upper box), and 5% and 95% percentiles (lower and upper line). Statistics: Wilcoxon signed-rank test, two-sided, on the untransformed data *p < 0.05. Model results from the GLMM on the transformed data show the same direction of significant differences (see Table 2). Raw data and code needed to reproduce this Figure can be found in https://zenodo.org/records/14002690, the summary data presented in the Figure is listed in the in S6A and B Table.

References

    1. von Kortzfleisch VT, Richter SH. Systematic heterogenization revisited: Increasing variation in animal experiments to improve reproducibility? J Neurosci Methods. 2024;401:109992. doi: 10.1016/j.jneumeth.2023.109992 - DOI - PubMed
    1. Plesser HE. Reproducibility vs. replicability: a brief history of a confused terminology. Front Neuroinform. 2018;11:76. doi: 10.3389/fninf.2017.00076 - DOI - PMC - PubMed
    1. Ioannidis JPA. Why most published research findings are false. PLoS Med. 2005;2(8):e124. doi: 10.1371/journal.pmed.0020124 - DOI - PMC - PubMed
    1. Freedman LP, Cockburn IM, Simcoe TS. The economics of reproducibility in preclinical research. PLoS Biol. 2015;13(6):e1002165. doi: 10.1371/journal.pbio.1002165 - DOI - PMC - PubMed
    1. Baker M. 1,500 scientists lift the lid on reproducibility. Nature. 2016;533(7604):452–4. doi: 10.1038/533452a - DOI - PubMed

LinkOut - more resources