Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Oct;19(5):834-847.
doi: 10.1016/j.gpb.2020.06.015. Epub 2021 Feb 17.

kLDM: Inferring Multiple Metagenomic Association Networks Based on the Variation of Environmental Factors

Affiliations

kLDM: Inferring Multiple Metagenomic Association Networks Based on the Variation of Environmental Factors

Yuqing Yang et al. Genomics Proteomics Bioinformatics. 2021 Oct.

Abstract

Identification of significant biological relationships or patterns is central to many metagenomic studies. Methods that estimate association networks have been proposed for this purpose; however, they assume that associations are static, neglecting the fact that relationships in a microbial ecosystem may vary with changes in environmental factors (EFs), which can result in inaccurate estimations. Therefore, in this study, we propose a computational model, called the k-Lognormal-Dirichlet-Multinomial (kLDM) model, which estimates multiple association networks that correspond to specific environmental conditions, and simultaneously infers microbe-microbe and EF-microbe associations for each network. The effectiveness of the kLDM model was demonstrated on synthetic data, a colorectal cancer (CRC) dataset, the Tara Oceans dataset, and the American Gut Project dataset. The results revealed that the widely-used Spearman's rank correlation coefficient method performed much worse than the other methods, indicating the importance of separating samples by environmental conditions. Cancer fecal samples were then compared with cancer-free samples, and the estimation achieved by kLDM exhibited fewer associations among microbes but stronger associations between specific bacteria, especially five CRC-associated operational taxonomic units, indicating gut microbe translocation in cancer patients. Some EF-dependent associations were then found within a marine eukaryotic community. Finally, the gut microbial heterogeneity of inflammatory bowel disease patients was detected. These results demonstrate that kLDM can elucidate the complex associations within microbial ecosystems. The kLDM program, R, and Python scripts, together with all experimental datasets, are accessible at https://github.com/tinglab/kLDM.git.

Keywords: Association inference; Bayesian model; Clustering; Environmental condition; Metagenomics.

PubMed Disclaimer

Conflict of interest statement

Competing interests All authors declare no conflicts of interest.

Figures

Figure 1
Figure 1
Schema of the kLDM model A. Multiple environmental conditions are assumed to exist in real environments, and the EF condition can change with time. An EF condition refers to 1) a group of samples in which the EF values fall into a small and defined range, and 2) under this EF condition, interactions within the microbial community are stable. B. Sequencing samples with related metadata, possibly from multiple EF conditions, were collected. After data preprocessing, clustering, and annotation, OTU counts were obtained for each sample. The information about which two samples belong to the same EF condition was unknown beforehand. C. The kLDM graphical model assumes K EF conditions within N samples, and infers the number of EF conditions, the associations among OTUs, and the associations between EFs and OTUs under every EF condition. Two matrices, Bk and Θk, respectively record direct EF–OTU associations and OTU–OTU associations for the kth EF condition. Vectors xi and mi are respectively the OTU counts and values of EFs in the ith sample. At every EF condition, it is assumed that the values of the EFs follow a multivariate Gaussian distribution, that is, they are parameterized by μk and Σk. The rest of the variables include the following: hi represents the latent relative ratios of microbes in the ith sample, αi is the absolute abundance of microbes, B0(k) represents the impact of unknown factors that affect the abundance of OTUs, ci indicates that the ith sample belongs to the EF condition ci, and πk is the mixture weight of the kth EF condition. D. Compositional bias caused by the normalization process on the OTU counts. After normalization, the microbial relative abundance sums to one. E. The indirect association between OTU-1 and OTU-2 induced by the common EF-1 can be recognized by kLDM, which takes the EF–OTU association into account. F. For kLDM, the number of EF conditions and the association networks of every EF condition are estimated by a split-merge optimization algorithm. Both the EF values and associations of microbes are taken into account to determine the EF condition. G. Parameters estimated by kLDM can be visualized into EF conditions and association networks. The blue and yellow edges correspond to negative and positive associations, respectively, and the thickness of an edge is proportional to the association value. EF, environmental factor; OTU, operational taxonomic unit; OTU–OTU, microbe–microbe; EF–OTU, environmental factor–microbe.
Figure 2
Figure 2
Comparison of the performance of kLDM with other methods on synthetic data The performance of kLDM was compared with that of three other association inference methods (SCC, CCLasso, and SPIEC-EASI). P, Q, K, and N represent the numbers of OTUs, EFs, EF conditions or clusters, and samples per EF condition, respectively. Values of P, Q, and K were fixed for all panels (P = 50, Q = 5, K = 2). SCC(all) is the result of SCC by assuming that there is only one EF condition in the dataset. A. and C. Comparisons of ROC curves (A) and AUC values (C) after setting N[100,200]. B. and D. Comparisons of ROC curves (B) and AUC values (D) after setting N[200,400]. The ROC curves of the OTU–OTU and EF–OTU associations of two clusters are orderly plotted. The red line corresponds to the result of kLDM. SCC, Spearman’s rank correlation coefficient; ROC, receiver operating characteristic; AUC, area under the curve.
Figure 3
Figure 3
Evaluation of the scalability of kLDM after increasing the numbers of microbes and EF conditions A. AUC values with 100 OTUs, 8 EFs, and 2 clusters. The number of samples of each cluster ranges from 400 to 800. B. AUC values with 200 OTUs, 10 EFs, and 2 clusters. The number of samples of each cluster ranges from 800 to 1600. C. AUC values with 50 OTUs, 5 EFs, and 3 clusters. The number of samples of each cluster ranges from 200 to 400. D. AUC values with 50 OTUs, 5 EFs, and 4 clusters. The number of samples of each cluster ranges from 200 to 400.

Similar articles

Cited by

References

    1. Integrative HMP (iHMP) Research Network Consortium. The Integrative Human Microbiome Project: dynamic analysis of microbiome-host omics profiles during periods of human health and disease. Cell Host Microbe 2014;16:276−89. - PMC - PubMed
    1. Kirchman D.L. Growth rates of microbes in the oceans. Ann Rev Mar Sci. 2015;8:285–309. - PubMed
    1. Gilbert J.A., Blaser M.J., Caporaso J.G., Jansson J.K., Lynch S.V., Knight R. Current understanding of the human microbiome. Nat Med. 2018;24:392–400. - PMC - PubMed
    1. Qin J., Li Y., Cai Z., Li S., Zhu J., Zhang F., et al. A metagenome-wide association study of gut microbiota in type 2 diabetes. Nature. 2013;490:55–60. - PubMed
    1. Goodrich J.K., Davenport E.R., Beaumont M., Jackson M.A., Knight R., Ober C., et al. Genetic determinants of the gut microbiome in UK Twins. Cell Host Microbe. 2016;19:731–743. - PMC - PubMed

Publication types