Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Sep 1;32(17):2611-7.
doi: 10.1093/bioinformatics/btw308. Epub 2016 May 14.

A two-part mixed-effects model for analyzing longitudinal microbiome compositional data

Affiliations

A two-part mixed-effects model for analyzing longitudinal microbiome compositional data

Eric Z Chen et al. Bioinformatics. .

Abstract

Motivation: The human microbial communities are associated with many human diseases such as obesity, diabetes and inflammatory bowel disease. High-throughput sequencing technology has been widely used to quantify the microbial composition in order to understand its impacts on human health. Longitudinal measurements of microbial communities are commonly obtained in many microbiome studies. A key question in such microbiome studies is to identify the microbes that are associated with clinical outcomes or environmental factors. However, microbiome compositional data are highly skewed, bounded in [0,1), and often sparse with many zeros. In addition, the observations from repeated measures in longitudinal studies are correlated. A method that takes into account these features is needed for association analysis in longitudinal microbiome data.

Results: In this paper, we propose a two-part zero-inflated Beta regression model with random effects (ZIBR) for testing the association between microbial abundance and clinical covariates for longitudinal microbiome data. The model includes a logistic regression component to model presence/absence of a microbe in the samples and a Beta regression component to model non-zero microbial abundance, where each component includes a random effect to account for the correlations among the repeated measurements on the same subject. Both simulation studies and the application to real microbiome data have shown that ZIBR model outperformed the previously used methods. The method provides a useful tool for identifying the relevant taxa based on longitudinal or repeated measures in microbiome research.

Availability and implementation: https://github.com/chvlyl/ZIBR CONTACT: hongzhe@upenn.edu.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Examples of two genera from the real human microbiome data. Red bars represent the density of the non-zero data (left Y axis). Black bars represent the zero proportion (right Y axis). Back curves show the fit of the non-zero data using a Beta distribution
Fig. 2.
Fig. 2.
ROC curves for identifying association by ZIBR and LMM, where 1000 species were simulated and 400 of them had true association with the covariate. The simulations were carried out with N  = 50 subjects and T  = 5 time points for each subject. LMM is the linear mixed-effects model with arcsine square root transformation on the microbial abundance. The best cutoff and the corresponding specificity and sensitivity for each method are indicated, where the best cutoff is defined as the value such that the sum of sensitivity and specificity is the largest (Color version of this figure is available at Bioinformatics online.)
Fig. 3.
Fig. 3.
Power curves for identifying association by ZIBR and LMM. In each plot, the power was plotted against the α 0 value, which controlled the proportion of zeros presented in the data, where a larger α 0 value indicated smaller proportion of zeros presented in the data. Four different scenarios were simulated (see Section 3 for details). The simulation for each α 0 value was repeated 10 000 times (Color version of this figure is available at Bioinformatics online.)
Fig. 4
Fig. 4
.Bacterial genera that showed different abundances between anti-TNF and EEN treatments identified by ZIBR and LMM after adjusting for the initial abundance. LMM identified seven genera, which were also identified by ZIBR. ZIBR identified four additional genera
Fig. 5.
Fig. 5.
Four genera identified by ZIBR but not by LMM. Left panel shows the percentage of samples in EEN or anti-TNF groups where the genus was present. Right panel shows the non-zero abundance in EEN or anti-TNF groups, where the abundances were logit-transformed (Color version of this figure is available at Bioinformatics online.)

References

    1. Anders S., Huber W. ( 2010. ) Differential expression analysis for sequence count data . Genome Biol ., 11 , R106.. - PMC - PubMed
    1. Arrieta M.C. et al. . ( 2015. ) Early infancy microbial and metabolic alterations affect risk of childhood asthma . Sci. Transl. Med ., 7 , 307ra152 – 307ra152 . - PubMed
    1. Bäckhed F. et al. . ( 2015. ) Dynamics and stabilization of the human gut microbiome during the first year of life . Cell Host Microbe , 17 , 690 – 703 . - PubMed
    1. Benjamini Y., Hochberg Y. ( 1995. ) Controlling the false discovery rate: a practical and powerful approach to multiple testing . J. R. Stat. Soc. Ser. B , 57 , 289 – 300 .
    1. Cox L.M. et al. . ( 2014. ) Altering the intestinal microbiota during a critical developmental window has lasting metabolic consequences . Cell , 158 , 705 – 721 . - PMC - PubMed