Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jun 28;19(6):e0305032.
doi: 10.1371/journal.pone.0305032. eCollection 2024.

Compositionally aware estimation of cross-correlations for microbiome data

Affiliations

Compositionally aware estimation of cross-correlations for microbiome data

Ib Thorsgaard Jensen et al. PLoS One. .

Abstract

In the field of microbiome studies, it is of interest to infer correlations between abundances of different microbes (here referred to as operational taxonomic units, OTUs). Several methods taking the compositional nature of the sequencing data into account exist. However, these methods cannot infer correlations between OTU abundances and other variables. In this paper we introduce the novel methods SparCEV (Sparse Correlations with External Variables) and SparXCC (Sparse Cross-Correlations between Compositional data) for quantifying correlations between OTU abundances and either continuous phenotypic variables or components of other compositional datasets, such as transcriptomic data. SparCEV and SparXCC both assume that the average correlation in the dataset is zero. Iterative versions of SparCEV and SparXCC are proposed to alleviate bias resulting from deviations from this assumption. We compare these new methods to empirical Pearson cross-correlations after applying naive transformations of the data (log and log-TSS). Additionally, we test the centered log ratio transformation (CLR) and the variance stabilising transformation (VST). We find that CLR and VST outperform naive transformations, except when the correlation matrix is dense. SparCEV and SparXCC outperform CLR and VST when the number of OTUs is small and perform similarly to CLR and VST for large numbers of OTUs. Adding the iterative procedure increases accuracy for SparCEV and SparXCC for all cases, except when the average correlation in the dataset is close to zero or the correlation matrix is dense. These results are consistent with our theoretical considerations.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Results on simulated data in case B.
MAE of different cross-correlation methods for correlation matrices generated by the cluster method (left column) and the loadings method (right column). For the cluster method, different p (number of OTUs) and c (the proportion of OTUs in a cluster) are used. For the loadings method, threshold values u = 0, 0.1, …, 0.8 (cf. (7)) and different p are used. The lines show the mean accuracy, and the edges of the envelopes show ±1.96 standard errors (SE). The results are based on 1000 simulated datasets where each simulated dataset has 50 replicates.
Fig 2
Fig 2. Results for simulated data with differing diversity in case B.
MAE of different cross-correlation methods for correlation matrices generated by the cluster method (left column) and the loadings method (right column). For the cluster method, different peff (effective number of OTUs) and c (the proportion of OTUs in a cluster) are used. For the loadings method, threshold values u = 0, 0.1, …, 0.8 (cf. (7)) and different peff are used. The lines show the mean accuracy, and the edges of the envelopes show ±1.96 SE. The results are based on 1000 simulated datasets where each simulated dataset has 50 replicates.
Fig 3
Fig 3. Correlations between microbial abundances and the severity of atopic dermatitis.
Results from a correlation analysis on atopic dermatitis data from Byrd et al. [38]. A: All correlations exceeding the permutation threshold m = 0.59 with color according to the sign of the correlation and with error bars given by the empirical bootstrap 95%-confidence interval. B: Scatter plot between the effective number of families and the objective SCORAD. The blue line is derived from a smooth line fitted to the data with 95% confidence intervals derived from the standard deviation. C: Scatter plot between the estimated correlations using log-TSS and SparCEV. The straight line has slope 1 and intercept 0. D: Scatter plot between the estimated correlations using SparCEV base and SparCEV iterative. The straight line has slope 1 and intercept 0.
Fig 4
Fig 4. Results for simulated data in case C.
MAE of different cross-correlation methods for correlation matrices generated by the cluster method (left column) and the loadings method (right column) in case C. For the cluster method, different p (number of OTUs), q (number of genes) and c (the proportion of OTUs in a cluster) are used. For the loadings method, threshold values u = 0, 0.1, …, 0.8 (cf. (7)) and different p and q are used. The lines show the mean accuracy, and the edges of the envelopes show ±1.96 standard errors (SE). The results are based on 200 simulated datasets where each simulated dataset has 50 replicates.
Fig 5
Fig 5. Correlation network between bacterial and fungal abundances in the root of Lotus japonicus.
Results from applying SparXCC to 16S and ITS sequencing data from the root microbiome of Lotus japonicus, from Thiergart et al. [20]. Each circular vertex represents a bacterial OTU from the 16S data and a square vertex represents a fungal OTU from the ITS data. Vertices are colored based on the phylum of the OTU it represents. Two vertices are connected by an edge if their estimated correlation is above the permutation threshold. The analysis is carried out separately for the genotypes Gifu, ram1, nfr5, ccamk, and symrk. Only cross-correlations are shown.

References

    1. McCombie WR, McPherson JD, Mardis ER. Next-Generation Sequencing Technologies. Cold Spring Harbor Perspectives in Medicine. 2019;9(11):a036798. doi: 10.1101/cshperspect.a036798 - DOI - PMC - PubMed
    1. Uhlen M, Zhang C, Lee S, Sjöstedt E, Fagerberg L, Bidkhori G, et al.. A pathology atlas of the human cancer transcriptome. Science. 2017;357(6352):eaan2507. doi: 10.1126/science.aan2507 - DOI - PubMed
    1. Casamassimi A, Federico A, Rienzo M, Esposito S, Ciccodicola A. Transcriptome Profiling in Human Diseases: New Advances and Perspectives. International Journal of Molecular Sciences. 2017;18(8):1652. doi: 10.3390/ijms18081652 - DOI - PMC - PubMed
    1. Ehrhart F, Coort SL, Eijssen L, Cirillo E, Smeets EE, Bahram Sangani N, et al.. Integrated analysis of human transcriptome data for Rett syndrome finds a network of involved genes. The World Journal of Biological Psychiatry. 2020;21(10):712–725. doi: 10.1080/15622975.2019.1593501 - DOI - PubMed
    1. Cani PD. Human gut microbiome: hopes, threats and promises. Gut. 2018;67(9):1716–1725. doi: 10.1136/gutjnl-2018-316723 - DOI - PMC - PubMed