Microbiome Datasets Are Compositional: And This Is Not Optional
- PMID: 29187837
- PMCID: PMC5695134
- DOI: 10.3389/fmicb.2017.02224
Microbiome Datasets Are Compositional: And This Is Not Optional
Abstract
Datasets collected by high-throughput sequencing (HTS) of 16S rRNA gene amplimers, metagenomes or metatranscriptomes are commonplace and being used to study human disease states, ecological differences between sites, and the built environment. There is increasing awareness that microbiome datasets generated by HTS are compositional because they have an arbitrary total imposed by the instrument. However, many investigators are either unaware of this or assume specific properties of the compositional data. The purpose of this review is to alert investigators to the dangers inherent in ignoring the compositional nature of the data, and point out that HTS datasets derived from microbiome studies can and should be treated as compositions at all stages of analysis. We briefly introduce compositional data, illustrate the pathologies that occur when compositional data are analyzed inappropriately, and finally give guidance and point to resources and examples for the analysis of microbiome datasets using compositional data analysis.
Keywords: Bayesian estimation; compositional data; correlation; count normalization; high-throughput sequencing; microbiota; relative abundance.
Figures
Similar articles
-
Unifying the analysis of high-throughput sequencing datasets: characterizing RNA-seq, 16S rRNA gene sequencing and selective growth experiments by compositional data analysis.Microbiome. 2014 May 5;2:15. doi: 10.1186/2049-2618-2-15. eCollection 2014. Microbiome. 2014. PMID: 24910773 Free PMC article.
-
It's all relative: analyzing microbiome data as compositions.Ann Epidemiol. 2016 May;26(5):322-9. doi: 10.1016/j.annepidem.2016.03.003. Epub 2016 Apr 2. Ann Epidemiol. 2016. PMID: 27143475 Review.
-
Instrumental variable estimation for compositional treatments.Sci Rep. 2025 Feb 12;15(1):5158. doi: 10.1038/s41598-025-89204-9. Sci Rep. 2025. PMID: 39934389 Free PMC article.
-
Large-scale benchmarking reveals false discoveries and count transformation sensitivity in 16S rRNA gene amplicon data analysis methods used in microbiome studies.Microbiome. 2016 Nov 25;4(1):62. doi: 10.1186/s40168-016-0208-8. Microbiome. 2016. PMID: 27884206 Free PMC article.
-
Compositional analysis: a valid approach to analyze microbiome high-throughput sequencing data.Can J Microbiol. 2016 Aug;62(8):692-703. doi: 10.1139/cjm-2015-0821. Epub 2016 Apr 12. Can J Microbiol. 2016. PMID: 27314511 Review.
Cited by
-
Prenatal stress leads to deficits in brain development, mood related behaviors and gut microbiota in offspring.Neurobiol Stress. 2021 May 3;15:100333. doi: 10.1016/j.ynstr.2021.100333. eCollection 2021 Nov. Neurobiol Stress. 2021. PMID: 34036126 Free PMC article.
-
The gut microbiome in children with mood, anxiety, and neurodevelopmental disorders: An umbrella review.Gut Microbiome (Camb). 2023 Sep 20;4:e18. doi: 10.1017/gmb.2023.16. eCollection 2023. Gut Microbiome (Camb). 2023. PMID: 39295902 Free PMC article. Review.
-
ASV vs OTUs clustering: Effects on alpha, beta, and gamma diversities in microbiome metabarcoding studies.PLoS One. 2024 Oct 3;19(10):e0309065. doi: 10.1371/journal.pone.0309065. eCollection 2024. PLoS One. 2024. PMID: 39361586 Free PMC article.
-
Host genetic control on rumen microbiota and its impact on dairy traits in sheep.Genet Sel Evol. 2022 Nov 24;54(1):77. doi: 10.1186/s12711-022-00769-9. Genet Sel Evol. 2022. PMID: 36434501 Free PMC article.
-
Consistent spatial patterns in microbial taxa of red squirrel gut microbiomes.Environ Microbiol Rep. 2024 Feb;16(1):e13209. doi: 10.1111/1758-2229.13209. Epub 2023 Nov 9. Environ Microbiol Rep. 2024. PMID: 37943285 Free PMC article.
References
-
- Aitchison J. (1983). Principal component analysis of compositional data. Biometrika 70, 57–65. 10.1093/biomet/70.1.57 - DOI
-
- Aitchison J. (1986). The Statistical Analysis of Compositional Data. London: Chapman and Hall.
-
- Aitchison J., Barceló-Vidal C., Martín-Fernández J. A., Pawlowsky-Glahn V. (2000). Logratio analysis and compositional distance. Math. Geol. 32, 271–275. 10.1023/A:1007529726302 - DOI
-
- Aitchison J., Greenacre M. (2002). Biplots of compositional data. J. Roy. Stat. Soc. Ser. C 51, 375–392. 10.1111/1467-9876.00275 - DOI
Publication types
LinkOut - more resources
Full Text Sources
Other Literature Sources
Medical
