Ranking metrics in gene set enrichment analysis: do they matter?
- PMID: 28499413
- PMCID: PMC5427619
- DOI: 10.1186/s12859-017-1674-0
Ranking metrics in gene set enrichment analysis: do they matter?
Abstract
Background: There exist many methods for describing the complex relation between changes of gene expression in molecular pathways or gene ontologies under different experimental conditions. Among them, Gene Set Enrichment Analysis seems to be one of the most commonly used (over 10,000 citations). An important parameter, which could affect the final result, is the choice of a metric for the ranking of genes. Applying a default ranking metric may lead to poor results.
Methods and results: In this work 28 benchmark data sets were used to evaluate the sensitivity and false positive rate of gene set analysis for 16 different ranking metrics including new proposals. Furthermore, the robustness of the chosen methods to sample size was tested. Using k-means clustering algorithm a group of four metrics with the highest performance in terms of overall sensitivity, overall false positive rate and computational load was established i.e. absolute value of Moderated Welch Test statistic, Minimum Significant Difference, absolute value of Signal-To-Noise ratio and Baumgartner-Weiss-Schindler test statistic. In case of false positive rate estimation, all selected ranking metrics were robust with respect to sample size. In case of sensitivity, the absolute value of Moderated Welch Test statistic and absolute value of Signal-To-Noise ratio gave stable results, while Baumgartner-Weiss-Schindler and Minimum Significant Difference showed better results for larger sample size. Finally, the Gene Set Enrichment Analysis method with all tested ranking metrics was parallelised and implemented in MATLAB, and is available at https://github.com/ZAEDPolSl/MrGSEA .
Conclusions: Choosing a ranking metric in Gene Set Enrichment Analysis has critical impact on results of pathway enrichment analysis. The absolute value of Moderated Welch Test has the best overall sensitivity and Minimum Significant Difference has the best overall specificity of gene set analysis. When the number of non-normally distributed genes is high, using Baumgartner-Weiss-Schindler test statistic gives better outcomes. Also, it finds more enriched pathways than other tested metrics, which may induce new biological discoveries.
Keywords: Functional genomics; GSEA; Pathway analysis; Ranking metrics.
Figures





Similar articles
-
The Baumgartner-Weiss-Schindler test for the detection of differentially expressed genes in replicated microarray experiments.Bioinformatics. 2004 Dec 12;20(18):3553-64. doi: 10.1093/bioinformatics/bth442. Epub 2004 Jul 29. Bioinformatics. 2004. PMID: 15284098
-
Novel learning framework (knockoff technique) to evaluate metric ranking algorithms to describe human response to injury.Traffic Inj Prev. 2018;19(sup2):S121-S126. doi: 10.1080/15389588.2018.1519805. Epub 2018 Dec 20. Traffic Inj Prev. 2018. PMID: 30570337
-
Sensitivity analysis of gene ranking methods in phenotype prediction.J Biomed Inform. 2016 Dec;64:255-264. doi: 10.1016/j.jbi.2016.10.012. Epub 2016 Oct 26. J Biomed Inform. 2016. PMID: 27793724
-
Beyond standard pipeline and p < 0.05 in pathway enrichment analyses.Comput Biol Chem. 2021 Jun;92:107455. doi: 10.1016/j.compbiolchem.2021.107455. Epub 2021 Feb 12. Comput Biol Chem. 2021. PMID: 33774420 Free PMC article. Review.
-
On the influence of several factors on pathway enrichment analysis.Brief Bioinform. 2022 May 13;23(3):bbac143. doi: 10.1093/bib/bbac143. Brief Bioinform. 2022. PMID: 35453140 Free PMC article. Review.
Cited by
-
Characterization of potential biomarkers of reactogenicity of licensed antiviral vaccines: randomized controlled clinical trials conducted by the BIOVACSAFE consortium.Sci Rep. 2019 Dec 30;9(1):20362. doi: 10.1038/s41598-019-56994-8. Sci Rep. 2019. PMID: 31889148 Free PMC article.
-
Cellular stress promotes NOD1/2-dependent inflammation via the endogenous metabolite sphingosine-1-phosphate.EMBO J. 2021 Jul 1;40(13):e106272. doi: 10.15252/embj.2020106272. Epub 2021 May 4. EMBO J. 2021. PMID: 33942347 Free PMC article.
-
Clinical and molecular sub-classification of hepatocellular carcinoma relative to alpha-fetoprotein level in an Asia-Pacific island cohort.Hepatoma Res. 2018;4:1. doi: 10.20517/2394-5079.2017.46. Epub 2018 Jan 12. Hepatoma Res. 2018. PMID: 29376136 Free PMC article.
-
Transcriptome Investigation and In Vitro Verification of Curcumin-Induced HO-1 as a Feature of Ferroptosis in Breast Cancer Cells.Oxid Med Cell Longev. 2020 Nov 18;2020:3469840. doi: 10.1155/2020/3469840. eCollection 2020. Oxid Med Cell Longev. 2020. PMID: 33294119 Free PMC article.
-
Genome-scale metabolic modeling reveals increased reliance on valine catabolism in clinical isolates of Klebsiella pneumoniae.NPJ Syst Biol Appl. 2022 Oct 28;8(1):41. doi: 10.1038/s41540-022-00252-7. NPJ Syst Biol Appl. 2022. PMID: 36307414 Free PMC article.
References
-
- Huang DW, Sherman BT, Tan Q, Kir J, Liu D, Bryant D, Guo Y, Stephens R, Baseler MW, Lane HC, et al. DAVID bioinformatics resources: expanded annotation database and novel algorithms to better extract biology from large gene lists. Nucleic Acids Res. 2007;35(suppl 2):169–75. doi: 10.1093/nar/gkm415. - DOI - PMC - PubMed
-
- Mootha VK, Lindgren CM, Eriksson KF, Subramanian A, Sihag S, Lehar J, Puigserver P, Carlsson E, Ridderstråle M, Laurila E, et al. PGC-1 α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat Genet. 2003;34(3):267–73. doi: 10.1038/ng1180. - DOI - PubMed
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources