Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Feb 1;33(3):414-424.
doi: 10.1093/bioinformatics/btw623.

Combining multiple tools outperforms individual methods in gene set enrichment analyses

Affiliations

Combining multiple tools outperforms individual methods in gene set enrichment analyses

Monther Alhamdoosh et al. Bioinformatics. .

Abstract

Motivation: Gene set enrichment (GSE) analysis allows researchers to efficiently extract biological insight from long lists of differentially expressed genes by interrogating them at a systems level. In recent years, there has been a proliferation of GSE analysis methods and hence it has become increasingly difficult for researchers to select an optimal GSE tool based on their particular dataset. Moreover, the majority of GSE analysis methods do not allow researchers to simultaneously compare gene set level results between multiple experimental conditions.

Results: The ensemble of genes set enrichment analyses (EGSEA) is a method developed for RNA-sequencing data that combines results from twelve algorithms and calculates collective gene set scores to improve the biological relevance of the highest ranked gene sets. EGSEA's gene set database contains around 25 000 gene sets from sixteen collections. It has multiple visualization capabilities that allow researchers to view gene sets at various levels of granularity. EGSEA has been tested on simulated data and on a number of human and mouse datasets and, based on biologists' feedback, consistently outperforms the individual tools that have been combined. Our evaluation demonstrates the superiority of the ensemble approach for GSE analysis, and its utility to effectively and efficiently extrapolate biological functions and potential involvement in disease processes from lists of differentially regulated genes.

Availability and implementation: EGSEA is available as an R package at http://www.bioconductor.org/packages/EGSEA/ . The gene sets collections are available in the R package EGSEAdata from http://www.bioconductor.org/packages/EGSEAdata/ .

Contacts: monther.alhamdoosh@csl.com.au mritchie@wehi.edu.au.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
A schematic overview of the EGSEA pipeline for gene set enrichment analysis
Fig. 2
Fig. 2
Multidimensional scaling plot based on the gene set rankings of the KEGG signalling and disease collections for ten GSE methods applied to the Human IL-13 versus control dataset. Methods that perform similarly on this dataset cluster together
Fig. 3
Fig. 3
Visualization of the gene sets retrieved by EGSEA at different levels. (A) Summary plots of EGSEA on the human dataset. The IDs of the top ten pathways based on EGSEA average rank are highlighted in black font and the top five pathways based on EGSEA significance score whose average ranks are not in the top ten ranks are highlighted in blue font. The bubble size indicates the level of pathway significance. The red and blue colours indicate that the majority of gene set genes are up- or down-regulated, respectively. (B) Heat maps of the gene expression fold-changes in three selected gene sets

References

    1. Alhamdoosh M., Wang D. (2014) Fast decorrelated neural network ensembles with random weights. Inf. Sci., 264, 104–117.
    1. Alhamdoosh M. et al. (2016a) EGSEA: Ensemble of Gene Set Enrichment Analyses. R package version 1.1.10. http://bioconductor.org/packages/EGSEA.
    1. Alhamdoosh M. et al. (2016b) EGSEAdata: Gene Set Collections for the EGSEA Package. R package version 1.1.4. http://bioconductor.org/packages/EGSEAdata
    1. Anders S. et al. (2014) HTSeq - A Python framework to work with high-throughput sequencing data. Bioinformatics, 31, 166–169. - PMC - PubMed
    1. Araki H. et al. (2012) GeneSetDB: A comprehensive meta-database, statistical and visualisation framework for gene set analysis. FEBS Open Bio, 2, 76–82. - PMC - PubMed