. 2021 Sep 7;17(9):e1009105.

doi: 10.1371/journal.pcbi.1009105. eCollection 2021 Sep.

Pathway analysis in metabolomics: Recommendations for the use of over-representation analysis

Affiliations

¹ Section of Bioinformatics, Division of Systems Medicine, Department of Metabolism, Digestion, and Reproduction, Faculty of Medicine, Imperial College London, London, United Kingdom.
² Toxalim (Research Centre in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, Toulouse, France.
³ Department of Infectious Disease, Faculty of Medicine, Imperial College London, London, United Kingdom.
⁴ Section of Biomolecular Medicine, Division of Systems Medicine, Department of Metabolism, Digestion, and Reproduction, Faculty of Medicine, Imperial College London, London, United Kingdom.
⁵ MetaToul-MetaboHUB, National Infrastructure of Metabolomics and Fluxomics, Toulouse, France.

PMID: 34492007
PMCID: PMC8448349
DOI: 10.1371/journal.pcbi.1009105

Pathway analysis in metabolomics: Recommendations for the use of over-representation analysis

Cecilia Wieder et al. PLoS Comput Biol. 2021.

. 2021 Sep 7;17(9):e1009105.

doi: 10.1371/journal.pcbi.1009105. eCollection 2021 Sep.

Authors

Affiliations

¹ Section of Bioinformatics, Division of Systems Medicine, Department of Metabolism, Digestion, and Reproduction, Faculty of Medicine, Imperial College London, London, United Kingdom.
² Toxalim (Research Centre in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, Toulouse, France.
³ Department of Infectious Disease, Faculty of Medicine, Imperial College London, London, United Kingdom.
⁴ Section of Biomolecular Medicine, Division of Systems Medicine, Department of Metabolism, Digestion, and Reproduction, Faculty of Medicine, Imperial College London, London, United Kingdom.
⁵ MetaToul-MetaboHUB, National Infrastructure of Metabolomics and Fluxomics, Toulouse, France.

PMID: 34492007
PMCID: PMC8448349
DOI: 10.1371/journal.pcbi.1009105

Abstract

Over-representation analysis (ORA) is one of the commonest pathway analysis approaches used for the functional interpretation of metabolomics datasets. Despite the widespread use of ORA in metabolomics, the community lacks guidelines detailing its best-practice use. Many factors have a pronounced impact on the results, but to date their effects have received little systematic attention. Using five publicly available datasets, we demonstrated that changes in parameters such as the background set, differential metabolite selection methods, and pathway database used can result in profoundly different ORA results. The use of a non-assay-specific background set, for example, resulted in large numbers of false-positive pathways. Pathway database choice, evaluated using three of the most popular metabolic pathway databases (KEGG, Reactome, and BioCyc), led to vastly different results in both the number and function of significantly enriched pathways. Factors that are specific to metabolomics data, such as the reliability of compound identification and the chemical bias of different analytical platforms also impacted ORA results. Simulated metabolite misidentification rates as low as 4% resulted in both gain of false-positive pathways and loss of truly significant pathways across all datasets. Our results have several practical implications for ORA users, as well as those using alternative pathway analysis methods. We offer a set of recommendations for the use of ORA in metabolomics, alongside a set of minimal reporting guidelines, as a first step towards the standardisation of pathway analysis in metabolomics.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

**Fig 1. Over Representation Analysis (ORA).**
**Venn diagram representing ORA parameters corresponding to Eq 1.** N represents compounds forming the background set, which covers part of the full metabolome. M represents compounds in the pathway of interest. n represents compounds of interest (i.e., differentially abundant metabolites), and k represents the overlap between the list of compounds of interest and compounds in the pathway.

**Fig 2. Effect of background set.**
A) Scatter plot of -log₁₀ p-values of pathways when using an assay-specific background set consisting of all measurable compounds in each dataset (x-axis) compared to using a non-specific background set containing all compounds mapping to at least one KEGG pathway (y-axis). Dashed black lines represent a p-value threshold equivalent to p = 0.1. Regression lines are shown with shading representing the 95% confidence interval. B) Number of pathways significant at p ≤ 0.1 (solid bars) and the number of pathways significant at q < 0.1 (hashed bars, BH FDR correction). Datasets are ordered by number of compounds mapping to KEGG pathways. **C and D)** The effect of reducing the size of the background set. C) Compounds were removed from the background set at random and DA metabolites were identified based on the modified background set. D) Only non-DA compounds were removed from the background set at random. In all panels a, c & d, dashed lines represent datasets where no chromatography/electrophoresis was used. Error bars represent standard error of the mean.

**Fig 3. Number of DA metabolites.**
The effect of the number of DA metabolites in the list of metabolites of interest on the number of significant pathways (p ≤ 0.1) in the Labbé et al. dataset. Results corresponding to Bonferroni thresholds are denoted by red markers while those corresponding to BH FDR thresholds are denoted by black markers. Marker shape (circle, cross, or triangle) represents the adjusted p-value threshold for DA metabolite selection (0.005, 0.05, and 0.1 respectively).

**Fig 4. Comparison of pathway databases and database updates.**
A) Pathway size distribution of KEGG, Reactome, and HumanCyc databases. Violin plots show the distribution of pathway size (number of compounds, log10 transformed). Bold vertical lines show median, dashed vertical lines show lower and upper quartiles. B) Comparison of Reactome human pathway set (R-HSA) releases spanning the years 2017 (R61, June 2017) to 2020 (R75, December 2020). Data for release 67 was not available. Dot colour corresponds to release version, with lighter colours representing newer releases.

**Fig 5. Metabolite misidentification.**
The effect of compound misidentification by molecular weight (20ppm window) (bars in dark colours) and chemical formula (bars in light colours) on the mean pathway loss rate (lower bars) and mean pathway gain rate (upper bars) averaged over 100 random resamplings at 4% misidentification. Error bars represent standard error of the mean.

**Fig 6. The effect of assay chemical specificity on pathways accessible in the KEGG metabolic network.**
Both figures a and b are based on the four assay types present in the Stevens et al. dataset. The colours in each subfigure correspond to the four assay types shown in the legend. A) KEGG reference metabolic network with compounds from each assay type highlighted on their respective pathways. KEGG network annotated using iPath 3 [22]. B) Venn diagram showing the number of KEGG pathways accessible using the compounds in each of the four assay types. Numbers outside the Venn diagram indicate the total number of pathways accessible with each assay type. Venn created using InteractiVenn [23].

See this image and copyright information in PMC

Cited by

Knowledge Graphs for drug repurposing: a review of databases and methods.
Perdomo-Quinteiro P, Belmonte-Hernández A. Perdomo-Quinteiro P, et al. Brief Bioinform. 2024 Sep 23;25(6):bbae461. doi: 10.1093/bib/bbae461. Brief Bioinform. 2024. PMID: 39325460 Free PMC article. Review.
Exploration of Blood Metabolite Signatures of Colorectal Cancer and Polyposis through Integrated Statistical and Network Analysis.
Di Cesare F, Vignoli A, Luchinat C, Tenori L, Saccenti E. Di Cesare F, et al. Metabolites. 2023 Feb 17;13(2):296. doi: 10.3390/metabo13020296. Metabolites. 2023. PMID: 36837915 Free PMC article.
Statistical analysis of high-dimensional biomedical data: a gentle introduction to analytical goals, common approaches and challenges.
Rahnenführer J, De Bin R, Benner A, Ambrogi F, Lusa L, Boulesteix AL, Migliavacca E, Binder H, Michiels S, Sauerbrei W, McShane L; for topic group “High-dimensional data” (TG9) of the STRATOS initiative. Rahnenführer J, et al. BMC Med. 2023 May 15;21(1):182. doi: 10.1186/s12916-023-02858-y. BMC Med. 2023. PMID: 37189125 Free PMC article. Review.
PathBank 2.0-the pathway database for model organism metabolomics.
Wishart DS, Kruger R, Sivakumaran A, Harford K, Sanford S, Doshi R, Kehrtarpal N, Fatokun O, Doucet D, Zubkowski A, Jackson H, Sykes G, Ramirez-Gaona M, Marcu A, Li C, Yee K, Garros C, Rayat DY, Coleongco J, Nandyala T, Gautam V, Oler E. Wishart DS, et al. Nucleic Acids Res. 2024 Jan 5;52(D1):D654-D662. doi: 10.1093/nar/gkad1041. Nucleic Acids Res. 2024. PMID: 37962386 Free PMC article.
GINv2.0: a comprehensive topological network integrating molecular interactions from multiple knowledge bases.
Chang X, Yan S, Zhang Y, Zhang Y, Li L, Gao Z, Lin X, Chi X. Chang X, et al. NPJ Syst Biol Appl. 2024 Jan 13;10(1):4. doi: 10.1038/s41540-024-00330-y. NPJ Syst Biol Appl. 2024. PMID: 38218959 Free PMC article.

See all "Cited by" articles

References

1. Nguyen TM, Shafi A, Nguyen T, Draghici S. Identifying significantly impacted pathways: A comprehensive review and assessment. Genome Biol. 2019;20. doi: 10.1186/s13059-019-1790-4 - DOI - PMC - PubMed
1. Khatri P, Sirota M, Butte AJ. Ten years of pathway analysis: Current approaches and outstanding challenges. Ouzounis CA, editor. PLoS Computational Biology. Public Library of Science; 2012. p. e1002375. doi: 10.1371/journal.pcbi.1002375 - DOI - PMC - PubMed
1. Karnovsky A, Li S. Pathway Analysis for Targeted and Untargeted Metabolomics. Methods in Molecular Biology. Humana Press Inc.; 2020. pp. 387–400. doi: 10.1007/978-1-0716-0239-3_19 - DOI - PubMed
1. Marco-Ramell A, Palau-Rodriguez M, Alay A, Tulipani S, Urpi-Sarda M, Sanchez-Pla A, et al.. Evaluation and comparison of bioinformatic tools for the enrichment analysis of metabolomics data. BMC Bioinformatics. 2018;19: 1. doi: 10.1186/s12859-017-2006-0 - DOI - PMC - PubMed
1. García-Campos MA, Espinal-Enríquez J, Hernández-Lemus E. Pathway analysis: State of the art. Frontiers in Physiology. Frontiers Research Foundation; 2015. doi: 10.3389/fphys.2015.00383 - DOI - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

Grants and funding

BB/T007974/1/BB_/Biotechnology and Biological Sciences Research Council/United Kingdom

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Pathway analysis in metabolomics: Recommendations for the use of over-representation analysis

Affiliations

Pathway analysis in metabolomics: Recommendations for the use of over-representation analysis

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources