Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Oct 15;31(20):3322-9.
doi: 10.1093/bioinformatics/btv364. Epub 2015 Jun 16.

Investigating microbial co-occurrence patterns based on metagenomic compositional data

Affiliations

Investigating microbial co-occurrence patterns based on metagenomic compositional data

Yuguang Ban et al. Bioinformatics. .

Abstract

Motivation: The high-throughput sequencing technologies have provided a powerful tool to study the microbial organisms living in various environments. Characterizing microbial interactions can give us insights into how they live and work together as a community. Metagonomic data are usually summarized in a compositional fashion due to varying sampling/sequencing depths from one sample to another. We study the co-occurrence patterns of microbial organisms using their relative abundance information. Analyzing compositional data using conventional correlation methods has been shown prone to bias that leads to artifactual correlations.

Results: We propose a novel method, regularized estimation of the basis covariance based on compositional data (REBACCA), to identify significant co-occurrence patterns by finding sparse solutions to a system with a deficient rank. To be specific, we construct the system using log ratios of count or proportion data and solve the system using the l1-norm shrinkage method. Our comprehensive simulation studies show that REBACCA (i) achieves higher accuracy in general than the existing methods when a sparse condition is satisfied; (ii) controls the false positives at a pre-specified level, while other methods fail in various cases and (iii) runs considerably faster than the existing comparable method. REBACCA is also applied to several real metagenomic datasets.

Availability and implementation: The R codes for the proposed method are available at http://faculty.wcas.northwestern.edu/∼hji403/REBACCA.htm

Contact: hongmei@northwestern.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Comparison of methods based on AUC. Each boxplot represents the AUC values calculated on 100 simulated datasets. Data are generated using LRN method with 100 samples based on three cases of structures of basis covariance including ‘Case 1’ with a hierarchical structure, ‘Case 2’ consisting four inter-correlated taxa and mostly negative correlations and ‘Case 3’ with three clustered groups. Two types of mean basis abundance are used in simulation such that average basis abundance are ‘equal’ or ‘unequal’ for different OTUs. Four methods are compared including the conventional correlation measure with resampling (BP), ReBoot,SparCC and REBACCA
Fig. 2.
Fig. 2.
Comparison of methods based on their controls for Type I error. Dash lines represent a targeted FPR at 0.05. That is, we consider a pair is correlated if the corresponding P-value is <0.05.Bars represent mean FPRs for 100 simulated datasets for each situation. Data are generated using LRN method with 100 samples
Fig. 3.
Fig. 3.
Comparison of methods based on their powers (sensitivity) and FPRs (1-specificity). A sequence of cutoffs with intervals of 0.01 is used to calculate the sensitivity and specificity. Points represent the average of the results for 100 simulated datasets in each situation. Data are generated using LRN method with 100 samples
Fig. 4.
Fig. 4.
Basis correlations between core OTUs. Results of correlations calculated from three types of mouse skin microbiota samples are shown, including (a) 606 correlated pairs from non-immunized (Control), (b) 1329 pairs from immunized but healthy (Healthy), (c) 818 pairs from immunized and developed EBA disease (EBA) samples. Within-phylum correlations are shown in the square areas for Firmicutes (F.), Proteobacteria (P.), Bacteroidetes (B.), Ctinobacteria (A.) and Cyanobacteria (C.). Correlated pairs are identified with FWER controlled at 0.03. (d) Venn diagram of consistent correlated OTUs from Control, Healthy and EBA samples. There are significantly more correlated pairs consistent between the immunized groups than other comparisons

References

    1. Aitchison J. (1981) A new approach to null correlations of proportions. J. Int. Assoc. Math. Geol., 13, 175–189.
    1. Brook I., et al. (1984) Synergistic effect of bacteroides, Clostridium, Fusobacterium, anaerobic cocci, and aerobic bacteria on mortality and induction of subcutaneous abscesses in mice. J. Infect. Dis., 149, 924–928. - PubMed
    1. Bryan L.E., et al. (1979) Mechanism of aminoglycoside antibiotic resistance in anaerobic bacteria: Clostridium perfringens and Bacteroides fragilis. Antimicrob. Agents Chemother., 15, 7–13. - PMC - PubMed
    1. Chaffron S., et al. (2010) A global network of coexisting microbes from environmental and whole-genome sequence data. Genome Res., 20, 947–959. - PMC - PubMed
    1. Chen J., Li H.Z. (2013) Variable selection for sparse Dirichlet-Multinomial regression with an application to microbiome data analysis. Ann Appl Stat., 7, 418–442. - PMC - PubMed

Publication types