Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jul 9;13(7):1224.
doi: 10.3390/genes13071224.

MicrobiomeGWAS: A Tool for Identifying Host Genetic Variants Associated with Microbiome Composition

Affiliations

MicrobiomeGWAS: A Tool for Identifying Host Genetic Variants Associated with Microbiome Composition

Xing Hua et al. Genes (Basel). .

Abstract

The microbiome is the collection of all microbial genes and can be investigated by sequencing highly variable regions of 16S ribosomal RNA (rRNA) genes. Evidence suggests that environmental factors and host genetics may interact to impact human microbiome composition. Identifying host genetic variants associated with human microbiome composition not only provides clues for characterizing microbiome variation but also helps to elucidate biological mechanisms of genetic associations, prioritize genetic variants, and improve genetic risk prediction. Since a microbiota functions as a community, it is best characterized by β diversity; that is, a pairwise distance matrix. We develop a statistical framework and a computationally efficient software package, microbiomeGWAS, for identifying host genetic variants associated with microbiome β diversity with or without interacting with an environmental factor. We show that the score statistics have positive skewness and kurtosis due to the dependent nature of the pairwise data, which makes p-value approximations based on asymptotic distributions unacceptably liberal. By correcting for skewness and kurtosis, we develop accurate p-value approximations, whose accuracy was verified by extensive simulations. We exemplify our methods by analyzing a set of 147 genotyped subjects with 16S rRNA microbiome profiles from non-malignant lung tissues. Correcting for skewness and kurtosis eliminated the dramatic deviation in the quantile-quantile plots. We provided preliminary evidence that six established lung cancer risk SNPs were collectively associated with microbiome composition for both unweighted (p = 0.0032) and weighted (p = 0.011) UniFrac distance matrices. In summary, our methods will facilitate analyzing large-scale genome-wide association studies of the human microbiome.

Keywords: gene–environment interaction; genome-wide association study; host genetics; microbiome; skewness and kurtosis; tail probabilities.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Microbiome distances are positively correlated with genetic distances at an associated SNP.
Figure 2
Figure 2
Define the joint test for testing H0: βM=βI=0 vs. βM>0 or βI>0. We assume that ZM~N(0,1), ZI~N(0,1) and cor(ZM,ZI)=ρ under H0. Details are in Appendix C.
Figure 3
Figure 3
Correcting tail probabilities for skewness and kurtosis. (A) The standard normal distribution N(0,1) and an approximately normal distribution with positive skewness. The skewness has big impact when calculating the tail probability P(Z>b) for a large value of b. (B) Numerical evaluation of tail probability approximation for ZM. We used the unweighted UniFrac distance matrix of 500 samples from the American Gut Project (AGP). For each value of b (>0), we calculated p-values P(ZM>b) based on N(0,1), skewness correction, both skewness and kurtosis correction, and 108 simulations. (C) Skewness depends on minor allele frequency (MAF) of SNPs and the sample size of the study, calculated based on the weighted UniFrac distance matrix in AGP data. (D) Kurtosis depends on MAF of SNPs and the sample size, calculated based on the weighted UniFrac distance matrix in the AGP data.
Figure 4
Figure 4
Computation time for a microbiome GWAS with 500,000 SNPs. “Main”: computation time for testing main effect only. “All”: computation time for testing main effect, interaction and the joint null hypothesis H0: βM=0,βI=0.
Figure 5
Figure 5
Results of analyzing the microbiome GWAS data of 147 adjacent normal lung tissues in the EAGLE study. (A) Skewness and kurtosis for the main effect test using the unweighted and the weighted UniFrac distance matrices. (B) Quantile–quantile (QQ) plot for association p-values using the unweighted UniFrac distance matrix. “Adjusted”: p-values were corrected for skewness and kurtosis. “Unadjusted”: p-values were approximated based on the asymptotic distribution N(0,1). (C) Quantile–quantile (QQ) plot for association p-values using the weighted UniFrac distance matrix. (D) Manhattan plots based on the unweighted or the weighted UniFrac distance matrices. (E) Box plots for the top nine loci in microbiome GWAS analysis. Subject pairs are classified into three groups according to the genetic distance |gigj| at the SNP. The y-coordinate is the microbiome distance.

References

    1. Turnbaugh P.J., Hamady M., Yatsunenko T., Cantarel B.L., Duncan A., Ley R.E., Sogin M.L., Jones W.J., Roe B.A., Affourtit J.P., et al. A core gut microbiome in obese and lean twins. Nature. 2009;457:480–484. doi: 10.1038/nature07540. - DOI - PMC - PubMed
    1. Morgan X.C., Tickle T.L., Sokol H., Gevers D., Devaney K.L., Ward D.V., Reyes J.A., Shah S.A., LeLeiko N., Snapper S.B., et al. Dysfunction of the intestinal microbiome in inflammatory bowel disease and treatment. Genome Biol. 2012;13:R79. doi: 10.1186/gb-2012-13-9-r79. - DOI - PMC - PubMed
    1. Ahn J., Sinha R., Pei Z., Dominianni C., Wu J., Shi J., Goedert J.J., Hayes R.B., Yang L. Human gut microbiome and risk for colorectal cancer. J. Natl. Cancer Inst. 2013;105:1907–1911. doi: 10.1093/jnci/djt300. - DOI - PMC - PubMed
    1. Goedert J.J., Jones G., Hua X., Xu X., Yu G., Flores R., Falk R.T., Gail M.H., Shi J., Ravel J., et al. Investigation of the Association Between the Fecal Microbiota and Breast Cancer in Postmenopausal Women: A Population-Based Case-Control Pilot Study. J. Natl. Cancer Inst. 2015;107:djv147. doi: 10.1093/jnci/djv147. - DOI - PMC - PubMed
    1. Lax S., Smith D.P., Hampton-Marcell J., Owens S.M., Handley K.M., Scott N.M., Gibbons S.M., Larsen P., Shogan B.D., Weiss S., et al. Longitudinal analysis of microbial interaction between humans and the indoor environment. Science. 2014;345:1048–1052. doi: 10.1126/science.1254529. - DOI - PMC - PubMed

Publication types

Substances

LinkOut - more resources