Bayesian Variable Shrinkage and Selection in Compositional Data Regression: Application to Oral Microbiome

Jyotishka Datta¹, Dipankar Bandyopadhyay²

Affiliations

¹ Department of Statistics, Virginia Polytechnic Institute and State University, 250 Drillfield Drive, Blacksburg, VA 24061 USA.
² Department of Biostatistics, School of Population Health, Virginia Commonwealth University, One Capital Square, 7th Floor, 830 East Main Street, PO Box 980032, Richmond, VA 23298-0032 USA.

PMID: 39403125
PMCID: PMC11470902
DOI: 10.1007/s41096-024-00194-9

Bayesian Variable Shrinkage and Selection in Compositional Data Regression: Application to Oral Microbiome

Jyotishka Datta et al. J Indian Soc Probab Stat. 2024.

. 2024;25(2):491-515.

doi: 10.1007/s41096-024-00194-9. Epub 2024 May 29.

Authors

Jyotishka Datta¹, Dipankar Bandyopadhyay²

Affiliations

¹ Department of Statistics, Virginia Polytechnic Institute and State University, 250 Drillfield Drive, Blacksburg, VA 24061 USA.
² Department of Biostatistics, School of Population Health, Virginia Commonwealth University, One Capital Square, 7th Floor, 830 East Main Street, PO Box 980032, Richmond, VA 23298-0032 USA.

PMID: 39403125
PMCID: PMC11470902
DOI: 10.1007/s41096-024-00194-9

Abstract

Microbiome studies generate multivariate compositional responses, such as taxa counts, which are strictly non-negative, bounded, residing within a simplex, and subject to unit-sum constraint. In presence of covariates (which can be moderate to high dimensional), they are popularly modeled via the Dirichlet-Multinomial (D-M) regression framework. In this paper, we consider a Bayesian approach for estimation and inference under a D-M compositional framework, and present a comparative evaluation of some state-of-the-art continuous shrinkage priors for efficient variable selection to identify the most significant associations between available covariates, and taxonomic abundance. Specifically, we compare the performances of the horseshoe and horseshoe+ priors (with the benchmark Bayesian lasso), utilizing Hamiltonian Monte Carlo techniques for posterior sampling, and generating posterior credible intervals. Our simulation studies using synthetic data demonstrate excellent recovery and estimation accuracy of sparse parameter regime by the continuous shrinkage priors. We further illustrate our method via application to a motivating oral microbiome data generated from the NYC-Hanes study. RStan implementation of our method is made available at the GitHub link: (https://github.com/dattahub/compshrink).

Keywords: Bayesian; Compositional data; Dirichlet; Generalized Dirichlet; Horseshoe; Large p; Shrinkage prior; Sparse probability vectors; Stick-breaking.

PubMed Disclaimer

Conflict of interest statement

Conflict of interestThe authors declare that they have no conflict of interest.

Figures

**Fig. 1**
Functional forms of common G-L priors, i.e., Cauchy, Horseshoe (HS), Horseshoe+ (HS+), and Laplace, near zero (left panel), and tails (right panel). While the x-axis represents values of $λ$ , the y-axis are the values of $π (λ)$

**Fig. 2**
Taxononomic composition of top 5% OTUs for the NYC-Hanes Data

**Fig. 3**
Rank Correlation heatmap for the NYC-Hanes Data

**Fig. 4**
Selected $β_{i, j}$ ’s from fitting the three competing methods to the NYC-Hanes data

**Fig. 5**
Density plots of the dissimilarity measures corresponding to the horseshoe, horseshoe+, and Laplace prior (Bayesian Lasso) assumptions in our proposed D-M model

**Fig. 6**
Simulation scheme I(a) results, evaluating recovery of true non-zero associations (left panel), and estimation accuracy (right panel)

**Fig. 7**
Simulation scheme 1(b) results, comparing recovery of true non-zero associations, via the 95% credible interval method (upper panel), and the 2-means model (lower panel)

**Fig. 8**
Simulation scheme II results, presenting boxplots of estimation (left panel) and misclassification (right panel) errors, corresponding to the three shrinkage priors, i.e., horseshoe, horseshoe+ and Bayesian Lasso

See this image and copyright information in PMC

References

1. Armagan A, Clyde M, Dunson DB (2011) Generalized beta mixtures of Gaussians. Adv Neural Inform Proc Syst 24:523–531 - PMC - PubMed
1. Armagan A, Dunson DB, Lee J (2013) Generalized double Pareto shrinkage. Stat Sin 23(1):119–143 - PMC - PubMed
1. Beghini F, Renson A, Zolnik CP, Geistlinger L, Usyk M, Moody TU, Thorpe L, Dowd JB, Burk R, Segata N et al (2019) Tobacco exposure associated with oral microbiota oxygen utilization in the new york city health and nutrition examination study. Ann Epidemiol 34:18–25 - PMC - PubMed
1. Betancourt M, Byrne S, Livingstone S, Girolami M (2017) The geometric foundations of Hamiltonian Monte Carlo. Bernoulli 23(4A):2257–2298. 10.3150/16-BEJ810 - DOI
1. Bhadra A, Datta J, Polson NG, Willard B (2016) Default bayesian analysis with global-local shrinkage priors. Biometrika 103(4):955–969

LinkOut - more resources

Full Text Sources
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Bayesian Variable Shrinkage and Selection in Compositional Data Regression: Application to Oral Microbiome

Affiliations

Bayesian Variable Shrinkage and Selection in Compositional Data Regression: Application to Oral Microbiome

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources