Benchmarking Differential Abundance Tests for 16S microbiome sequencing data using simulated data based on experimental templates

doi:10.1371/journal.pone.0321452

. 2025 May 19;20(5):e0321452.

doi: 10.1371/journal.pone.0321452. eCollection 2025.

Benchmarking Differential Abundance Tests for 16S microbiome sequencing data using simulated data based on experimental templates

Eva Kohnert¹, Clemens Kreutz¹

Affiliations

PMID: 40388544
PMCID: PMC12088514
DOI: 10.1371/journal.pone.0321452

Benchmarking Differential Abundance Tests for 16S microbiome sequencing data using simulated data based on experimental templates

Eva Kohnert et al. PLoS One. 2025.

. 2025 May 19;20(5):e0321452.

doi: 10.1371/journal.pone.0321452. eCollection 2025.

Authors

Eva Kohnert¹, Clemens Kreutz¹

Affiliation

¹ Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center, University of Freiburg, Germany.

PMID: 40388544
PMCID: PMC12088514
DOI: 10.1371/journal.pone.0321452

Abstract

Differential abundance (DA) analysis of metagenomic microbiome data is essential for understanding microbial community dynamics across various environments and hosts. Identifying microorganisms that differ significantly in abundance between conditions (e.g., health vs. disease) is crucial for insights into environmental adaptations, disease development, and host health. However, the statistical interpretation of microbiome data is challenged by inherent sparsity and compositional nature, necessitating tailored DA methods. This benchmarking study aims to simulate synthetic 16S microbiome data using metaSPARSim (Patuzzi I, Baruzzo G, Losasso C, Ricci A, Di Camillo B. MetaSPARSim: a 16S rRNA gene sequencing count data simulator. BMC Bioinformatics. 2019;20:416. https://doi.org/10.1186/s12859-019-2882-6 PMID: 31757204) MIDASim (He M, Zhao N, Satten GA. MIDASim: a fast and simple simulator for realistic microbiome data. Available from: https://doi.org/10.1101/2023.03.23.533996), and sparseDOSSA2 (Ma S, Ren B, Mallick H, Moon YS, Schwager E, Maharjan S, et al. A statistical model for describing and simulating microbial community profiles. PLOS Comput Biol. 2021;17(9):e1008913. https://doi.org/10.1371/journal.pcbi.1008913 PMID: 34516542) , leveraging 38 real-world experimental templates (S3 Table) previously utilized in a benchmark study comparing DA tools. These datasets, drawn from diverse environments such as human gut, soil, and marine habitats, serve as the foundation for our simulation efforts. We employ the same 14 DA tests that were previously used with the same experimental data in benchmark studies alongside 8 DA tests that were developed subsequently. Initially, we will generate synthetic data closely mirroring the experimental datasets, incorporating a known truth to cover a broad range of real-world data characteristics. This approach allows us to assess the ability of DA methods to recover known true differential abundances. We will further simulate datasets by altering sparsity, effect size, and sample size, thus creating a comprehensive collection for applying the 22 DA tests. The outcomes, focusing on sensitivities and specificities, will provide insights into the performance of DA tests and their dependencies on sparsity, effect size, and sample size. Additionally, we will calculate data characteristics (S1 and S2 Table) for each simulated dataset and use a multiple regression to identify informative data characteristics influencing test performance. Our prior study, where we used simulated data without incorporating a known truth, demonstrated the feasibility of using synthetic data to validate experimental findings. This current study aims to enhance our understanding by systematically evaluating the impact of known truth incorporation on DA test performance, thereby providing further information for the selection and application of DA methods in microbiome research.

Copyright: © 2025 Kohnert, Kreutz. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

**Fig 1. Preliminary results assessing the similarity of simulated data and corresponding templates. A.**
Overall similarity of simulated data and templates for metaSPARSim. PCA plot on 46 scaled data characteristics for 38 templates and 10 corresponding simulations. Templates are plotted as squares and simulations as dots in the same colour. B. Accuracy of four representative single data characteristics. Overall magnitudes of visible bias and heterogeneities are highlighted by blue arrows. The left sections in all panels show the natural variability of a specific data characteristic among the templates. Here, the log2-ratios of the data characteristics from one template to all other is summarized as boxplot. In the middle the precision of the data characteristic in the simulations compared to the corresponding template is displayed. The right sections show log2-ratios of the data characteristic between all simulations belonging to the same template.

**Fig 2. Overview about the data generating mechanism including the dataset selection process.**
Flowchart summarizing the data generating and selection mechanism throughout the entire workflow of the study.

**Fig 3. Illustration of detection of outlier data sets after simulation.**
Each dot represents the number of non-equivalent data characteristics for a data template. If this number is an outlier in the boxplot the synthetic data from this template will be removed from the analysis. A If sparseDOSSA2 would result in such an outcome, the synthetic dataset for the template MALL would be removed. B If metaSPARSim would result in such a boxplot, based on the outlier criteria two data templates would be removed from the analysis (Ji_WTP_DS and t1d_alkanani).

Fig 4. Overview of the complete analysis workflow including the data simulation process. Fig 4 provides an overview about the analyses conducted within our study that are described in the following sections.

See this image and copyright information in PMC

References

1. Nearing JT, Douglas GM, Hayes MG, MacDonald J, Desai DK, Allward N, et al.. Microbiome differential abundance methods produce different results across 38 datasets. Nat Commun. 2022;13(1):342. doi: 10.1038/s41467-022-28034-z - DOI - PMC - PubMed
1. Kohnert E, Kreutz C. Computational study protocol: leveraging synthetic data to validate a benchmark study for Differential Abundance Tests for 16S microbiome sequencing data. F1000Research. 2025. Jan 2;13:1180. - PMC - PubMed
1. Patuzzi I, Baruzzo G, Losasso C, Ricci A, Di Camillo B. MetaSPARSim: a 16S rRNA gene sequencing count data simulator. BMC Bioinformatics. 2019;20:416. doi: 10.1186/s12859-019-2882-6 - DOI - PMC - PubMed
1. Ma S, Ren B, Mallick H, Moon YS, Schwager E, Maharjan S, et al.. A statistical model for describing and simulating microbial community profiles. PLOS Comput Biol. 2021;17(9):e1008913. doi: 10.1371/journal.pcbi.1008913 - DOI - PMC - PubMed
1. He M, Zhao N, Satten GA. MIDASim: a fast and simple simulator for realistic microbiome data. Available from: doi: 10.1101/2023.03.23.533996 - DOI - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
- PubMed Central
- Public Library of Science

[1] Nearing JT, Douglas GM, Hayes MG, MacDonald J, Desai DK, Allward N, et al.. Microbiome differential abundance methods produce different results across 38 datasets. Nat Commun. 2022;13(1):342. doi: 10.1038/s41467-022-28034-z - DOI - PMC - PubMed

[2] Nearing JT, Douglas GM, Hayes MG, MacDonald J, Desai DK, Allward N, et al.. Microbiome differential abundance methods produce different results across 38 datasets. Nat Commun. 2022;13(1):342. doi: 10.1038/s41467-022-28034-z - DOI - PMC - PubMed

[3] Kohnert E, Kreutz C. Computational study protocol: leveraging synthetic data to validate a benchmark study for Differential Abundance Tests for 16S microbiome sequencing data. F1000Research. 2025. Jan 2;13:1180. - PMC - PubMed

[4] Kohnert E, Kreutz C. Computational study protocol: leveraging synthetic data to validate a benchmark study for Differential Abundance Tests for 16S microbiome sequencing data. F1000Research. 2025. Jan 2;13:1180. - PMC - PubMed

[5] Patuzzi I, Baruzzo G, Losasso C, Ricci A, Di Camillo B. MetaSPARSim: a 16S rRNA gene sequencing count data simulator. BMC Bioinformatics. 2019;20:416. doi: 10.1186/s12859-019-2882-6 - DOI - PMC - PubMed

[6] Patuzzi I, Baruzzo G, Losasso C, Ricci A, Di Camillo B. MetaSPARSim: a 16S rRNA gene sequencing count data simulator. BMC Bioinformatics. 2019;20:416. doi: 10.1186/s12859-019-2882-6 - DOI - PMC - PubMed

[7] Ma S, Ren B, Mallick H, Moon YS, Schwager E, Maharjan S, et al.. A statistical model for describing and simulating microbial community profiles. PLOS Comput Biol. 2021;17(9):e1008913. doi: 10.1371/journal.pcbi.1008913 - DOI - PMC - PubMed

[8] Ma S, Ren B, Mallick H, Moon YS, Schwager E, Maharjan S, et al.. A statistical model for describing and simulating microbial community profiles. PLOS Comput Biol. 2021;17(9):e1008913. doi: 10.1371/journal.pcbi.1008913 - DOI - PMC - PubMed

[9] He M, Zhao N, Satten GA. MIDASim: a fast and simple simulator for realistic microbiome data. Available from: doi: 10.1101/2023.03.23.533996 - DOI - PMC - PubMed

[10] He M, Zhao N, Satten GA. MIDASim: a fast and simple simulator for realistic microbiome data. Available from: doi: 10.1101/2023.03.23.533996 - DOI - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Benchmarking Differential Abundance Tests for 16S microbiome sequencing data using simulated data based on experimental templates

Affiliation

Benchmarking Differential Abundance Tests for 16S microbiome sequencing data using simulated data based on experimental templates

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

Similar articles

References

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Abstract

Conflict of interest statement

Figures

Similar articles

References

MeSH terms

Substances

Related information

LinkOut - more resources

Full Text Sources