Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jul 25;8(1):1100.
doi: 10.1038/s42003-025-08515-9.

A systematic benchmark of integrative strategies for microbiome-metabolome data

Affiliations

A systematic benchmark of integrative strategies for microbiome-metabolome data

Loïc Mangnier et al. Commun Biol. .

Abstract

The rapid advancement of high-throughput sequencing technologies has enabled the integration of various omic layers into computational frameworks. Among these, metagenomics and metabolomics are increasingly studied for their roles in complex diseases. However, no standard currently exists for jointly integrating microbiome and metabolome datasets within statistical models. We benchmarked nineteen integrative methods to disentangle the relationships between microorganisms and metabolites. These methods address key research goals, including global associations, data summarization, individual associations, and feature selection. Through realistic simulations, we identified the best-performing methods and validated them on real gut microbiome datasets, revealing complementary biological processes across the two omic layers. Practical guidelines are provided for specific scientific questions and data types. This work establishes a foundation for research standards in metagenomics-metabolomics integration and supports future methodological developments, while also providing guidance for designing optimal analytical strategies tailored to specific integration questions.

PubMed Disclaimer

Conflict of interest statement

Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Overview of the simulation setup based on real datasets.
A Three microbiome-metabolome datasets were selected, each exhibiting different data structures and correlations. We reported the sample size (N) and the number of features (P), as N × P, for each dataset. B Realistic datasets were simulated using the “Normal-to-Anything" (NORtA) framework. First, we estimated sparse microbiome and metabolome correlation networks using SpiecEasi. Second, correlated multivariate Gaussian distributions were generated for both microbiome and metabolome datasets using the correlation structures estimated in the previous step. Third, Gaussian distributions were converted into arbitrary distributions matching the original data structures. C Associations between species and metabolites were specified, mimicking the complex entanglement between the two omic layers. For each dataset, proportions of associated features vary between 1% and 10%, with association strengths randomly picked from a Gaussian distribution.
Fig. 2
Fig. 2. Performance of the multivariate methods for global associations or data summarization.
A QQ-Plot of the Mantel test and the Procrustes Analysis across microbiome normalizations and distance kernels. For the Mantel test, we considered Spearman’s method for computing the global association between the two datasets. P values for both the Mantel test and Procrustes Analysis were obtained empirically based on 1000 replicates. B Power of the Mantel test and the Procrustes Analysis across microbiome normalizations and distance kernels. For the Mantel test, we considered Spearman’s method for computing the global association between the two datasets. P values for both the Mantel test and Procrustes Analysis were obtained empirically based on 1000 replicates. P values ≤ 0.05 were considered significant. C QQ-Plot of MMiRKAT, the Mantel test, and the Procrustes Analysis across microbiome normalizations and distance kernels. Points below the straight line refer to conservative behavior in the result section. To accommodate MMiRKAT (fewer number of features than sample size), we considered scenarios with a smaller number of features in both omics layers than the number of individuals (See supplementary methods). D Power of MMiRKAT, the Mantel test, and the Procrustes Analysis across microbiome normalizations and distance kernels. To accommodate MMiRKAT (fewer number of features than sample size), we considered scenarios with a smaller number of features in both omics layers than the number of individuals (See supplementary methods). P values for both the Mantel test and Procrustes Analysis were obtained empirically based on 1000 replicates. P values ≤ 0.05 were considered significant. E Proportion of explained variance for the data summarization methods across different data structures and normalizations considering the log metabolome. Data summarization methods were compared considering scenarios with a number of features half the number of individuals (See supplementary methods).
Fig. 3
Fig. 3. Performance of the individual association methods for compositional predictors.
To accommodate long running times due to the number of pairs between species and metabolites, we considered scenarios with a number of features half the number of individuals (See supplementary methods) A QQplots of the individual association methods across our three simulation settings. B Power of the individual association methods across our two main scenarios. P values ≤ 0.05 were considered significant. For the CLR-lm method and HALLA, p-values were combined using ACAT in order to provide similar comparisons with the log-contrast regression and MiRKAT (See Methods). For MiRKAT, we reported Type-I error rate and power using the ILR transformed microbiome data and the log transformed metabolites, while for HALLA, we considered the CLR transformed microbiome and the log metabolome. The straight line represents the background ACAT-combined power using Spearman’s correlation on the CLR microbiome and the log metabolome. Powers were averaged over 1000 replicates.
Fig. 4
Fig. 4. Performance (Sparsity (Spa.), Sensibility (Sens.), Specificity (Spe.)) of the feature selection methods for providing a sparse and reliable subset of elements across our two scenarios.
A Performance of univariate feature selection methods considering microorganisms as covariates across our three settings. Metabolites were log-transformed before running the methods. Performances were calculated on 100 replicates. For CODA-LASSO in the Konzo scenario, we adapted the simulation setting, selecting 300 species and 600 metabolites to accommodate running times of the method (See supplementary methods). B Performance of multivariate feature selection methods. Metabolites were log-transformed before running the methods. sPLS-Reg1 and sPLS-Reg2 correspond to the sPLS-Reg with X = microbiome and X = metabolome, respectively.
Fig. 5
Fig. 5. Application of best strategies highlights complementary biological interactions between microorganisms and metabolites in Konzo data.
A Proportion of cumulative explained variance in Metabolome and Microbiome datasets in both affected and unaffected individuals B Top-20 of the most contributing species and metabolites on the first RDA component in healthy and affected samples. Positive correlations were identified by a +, while negative correlations were identified with a - sign. Projection of metabolites (red) and microorganisms (blue) into the 2D regression sPLS space in C affected and D unaffected individuals. Features with null loadings were removed from the analysis. E Coefficients provided by the CODA-LASSO across mevalonate and 3-hydroxyisobutyrate were identified only in Konzo by the regression sPLS. Positive coefficients were identified by a +, while negative coefficients were identified with a - sign F Network between mevalonate and 3-hydroxyisobutyrate and their corresponding associated species found by CODA-LASSO. Positive associations were represented by green edges and negative associations by pink edges.

References

    1. Rohart, F., Gautier, B., Singh, A. & Cao, K. A. L. mixOmics: an R package for omics feature selection and multiple data integration. PLoS Comput. Biol.13, e1005752 (2017). - PMC - PubMed
    1. Tang, Z. Z. Multi-omic analysis of the microbiome and metabolome in healthy subjects reveals microbiome-dependent relationships between diet and metabolites. Front. Genet.10, 454 (2019). - PMC - PubMed
    1. Vernocchi, P., Chierico, F. D. & Putignani, L. Gut microbiota profiling: metabolomics based approach to unravel compounds affecting human health. Front. Microbiol.7, 1144 (2016). - PMC - PubMed
    1. Fromentin, S. Microbiome and metabolome features of the cardiometabolic disease spectrum. Nat. Med.28, 303–314 (2022). - PMC - PubMed
    1. Dan, Z. Altered gut microbial profile is associated with abnormal metabolism activity of autism spectrum disorder. Gut Microbes11, 1246–1267 (2020). - PMC - PubMed

LinkOut - more resources