Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Mar 3:3:6.
doi: 10.1038/s41540-017-0007-2. eCollection 2017.

On the performance of de novo pathway enrichment

Affiliations

On the performance of de novo pathway enrichment

Richa Batra et al. NPJ Syst Biol Appl. .

Abstract

De novo pathway enrichment is a powerful approach to discover previously uncharacterized molecular mechanisms in addition to already known pathways. To achieve this, condition-specific functional modules are extracted from large interaction networks. Here, we give an overview of the state of the art and present the first framework for assessing the performance of existing methods. We identified 19 tools and selected seven representative candidates for a comparative analysis with more than 12,000 runs, spanning different biological networks, molecular profiles, and parameters. Our results show that none of the methods consistently outperforms the others. To mitigate this issue for biomedical researchers, we provide guidelines to choose the appropriate tool for a given dataset. Moreover, our framework is the first attempt for a quantitative evaluation of de novo methods, which will allow the bioinformatics community to objectively compare future tools against the state of the art.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
A typical workflow for de novo pathway enrichment. The underlying hypothesis is that phenotype-specific genes (foreground, FG) are differentially expressed in many case samples compared to a control group (1, 2) By using statistical tests, one can determine which genes are affected by the phenotype (3) and overlay this information on an interaction network (4) De novo pathway enrichment tools aim to extract sub-networks enriched with phenotype-specific FG genes (5) Comparing several such methods is an open issue
Fig. 2
Fig. 2
Average performance for over 80 FG sets of size 20 nodes were generated, using the AVDk algorithm, with varying signal strength (a, b) and varying sparsity (c, d). Expression profiles were simulated with varying mean (VM) (a, c) and varying variation (VV) (b, d). The HPRD network was used as the input network and the performance was assessed using the F-measure. The error-bars (a, b) and box plots (c, d) represent performance over several FG nodes and over a range of internal parameter settings for each tool. The higher the signal strength, the more different are the expression profiles of the FG vs. BG genes, indicating that we can expect pathway enrichment methods to identify them more easily. For details on internal parameters, signal strength and sparsity values, please refer to Supplementary Tables 1, 3, 5 respectively
Fig. 3
Fig. 3
Average performance for over 80 FG sets of size 20 nodes generated using the AVDk algorithm. Expression profiles were simulated with VM. The HPRD network was used as the input network. Performance was assessed with the F-measure for a range of internal parameter settings for each tool, i.e. the expected pathway size (M) for GXNA, the number of allowed exceptions (outliers) in a pathways (L) for KPM, and the pathway false discovery rate (FDR) for BioNet
Fig. 4
Fig. 4
Illustration of the used models for FG and BG expression distributions generated for cases and control samples: VV in (a) and VM in (b)

References

    1. Barrett T, et al. NCBI GEO: archive for functional genomics data sets-update. Nucleic Acids Res. 2013;41:D991–995. doi: 10.1093/nar/gks1193. - DOI - PMC - PubMed
    1. International Cancer Genome, C. et al. International network of cancer genome projects. Nature. 2010;464:993–998. doi: 10.1038/nature08987. - DOI - PMC - PubMed
    1. Alexa A, Rahnenführer J, Lengauer T. Improved scoring of functional groups from gene expression data by decorrelating GO graph structure. Bioinformatics. 2006;22:1600–1607. doi: 10.1093/bioinformatics/btl140. - DOI - PubMed
    1. Vaske CJ, et al. Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM. Bioinformatics. 2010;26:i237–245. doi: 10.1093/bioinformatics/btq182. - DOI - PMC - PubMed
    1. Hartwell LH, et al. From molecular to modular cell biology. Nature. 1999;402:C47–C52. doi: 10.1038/35011540. - DOI - PubMed