Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024;25(2):491-515.
doi: 10.1007/s41096-024-00194-9. Epub 2024 May 29.

Bayesian Variable Shrinkage and Selection in Compositional Data Regression: Application to Oral Microbiome

Affiliations

Bayesian Variable Shrinkage and Selection in Compositional Data Regression: Application to Oral Microbiome

Jyotishka Datta et al. J Indian Soc Probab Stat. 2024.

Abstract

Microbiome studies generate multivariate compositional responses, such as taxa counts, which are strictly non-negative, bounded, residing within a simplex, and subject to unit-sum constraint. In presence of covariates (which can be moderate to high dimensional), they are popularly modeled via the Dirichlet-Multinomial (D-M) regression framework. In this paper, we consider a Bayesian approach for estimation and inference under a D-M compositional framework, and present a comparative evaluation of some state-of-the-art continuous shrinkage priors for efficient variable selection to identify the most significant associations between available covariates, and taxonomic abundance. Specifically, we compare the performances of the horseshoe and horseshoe+ priors (with the benchmark Bayesian lasso), utilizing Hamiltonian Monte Carlo techniques for posterior sampling, and generating posterior credible intervals. Our simulation studies using synthetic data demonstrate excellent recovery and estimation accuracy of sparse parameter regime by the continuous shrinkage priors. We further illustrate our method via application to a motivating oral microbiome data generated from the NYC-Hanes study. RStan implementation of our method is made available at the GitHub link: (https://github.com/dattahub/compshrink).

Keywords: Bayesian; Compositional data; Dirichlet; Generalized Dirichlet; Horseshoe; Large p; Shrinkage prior; Sparse probability vectors; Stick-breaking.

PubMed Disclaimer

Conflict of interest statement

Conflict of interestThe authors declare that they have no conflict of interest.

Figures

Fig. 1
Fig. 1
Functional forms of common G-L priors, i.e., Cauchy, Horseshoe (HS), Horseshoe+ (HS+), and Laplace, near zero (left panel), and tails (right panel). While the x-axis represents values of λ, the y-axis are the values of π(λ)
Fig. 2
Fig. 2
Taxononomic composition of top 5% OTUs for the NYC-Hanes Data
Fig. 3
Fig. 3
Rank Correlation heatmap for the NYC-Hanes Data
Fig. 4
Fig. 4
Selected βi,j’s from fitting the three competing methods to the NYC-Hanes data
Fig. 5
Fig. 5
Density plots of the dissimilarity measures corresponding to the horseshoe, horseshoe+, and Laplace prior (Bayesian Lasso) assumptions in our proposed D-M model
Fig. 6
Fig. 6
Simulation scheme I(a) results, evaluating recovery of true non-zero associations (left panel), and estimation accuracy (right panel)
Fig. 7
Fig. 7
Simulation scheme 1(b) results, comparing recovery of true non-zero associations, via the 95% credible interval method (upper panel), and the 2-means model (lower panel)
Fig. 8
Fig. 8
Simulation scheme II results, presenting boxplots of estimation (left panel) and misclassification (right panel) errors, corresponding to the three shrinkage priors, i.e., horseshoe, horseshoe+ and Bayesian Lasso

Similar articles

References

    1. Armagan A, Clyde M, Dunson DB (2011) Generalized beta mixtures of Gaussians. Adv Neural Inform Proc Syst 24:523–531 - PMC - PubMed
    1. Armagan A, Dunson DB, Lee J (2013) Generalized double Pareto shrinkage. Stat Sin 23(1):119–143 - PMC - PubMed
    1. Beghini F, Renson A, Zolnik CP, Geistlinger L, Usyk M, Moody TU, Thorpe L, Dowd JB, Burk R, Segata N et al (2019) Tobacco exposure associated with oral microbiota oxygen utilization in the new york city health and nutrition examination study. Ann Epidemiol 34:18–25 - PMC - PubMed
    1. Betancourt M, Byrne S, Livingstone S, Girolami M (2017) The geometric foundations of Hamiltonian Monte Carlo. Bernoulli 23(4A):2257–2298. 10.3150/16-BEJ810
    1. Bhadra A, Datta J, Polson NG, Willard B (2016) Default bayesian analysis with global-local shrinkage priors. Biometrika 103(4):955–969

LinkOut - more resources