Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Jan 4:9:3.
doi: 10.1186/1471-2105-9-3.

Extending pathways based on gene lists using InterPro domain signatures

Affiliations

Extending pathways based on gene lists using InterPro domain signatures

Florian Hahne et al. BMC Bioinformatics. .

Abstract

Background: High-throughput technologies like functional screens and gene expression analysis produce extended lists of candidate genes. Gene-Set Enrichment Analysis is a commonly used and well established technique to test for the statistically significant over-representation of particular pathways. A shortcoming of this method is however, that most genes that are investigated in the experiments have very sparse functional or pathway annotation and therefore cannot be the target of such an analysis. The approach presented here aims to assign lists of genes with limited annotation to previously described functional gene collections or pathways. This works by comparing InterPro domain signatures of the candidate gene lists with domain signatures of gene sets derived from known classifications, e.g. KEGG pathways.

Results: In order to validate our approach, we designed a simulation study. Based on all pathways available in the KEGG database, we create test gene lists by randomly selecting pathway genes, removing these genes from the known pathways and adding variable amounts of noise in the form of genes not annotated to the pathway. We show that we can recover pathway memberships based on the simulated gene lists with high accuracy. We further demonstrate the applicability of our approach on a biological example.

Conclusion: Results based on simulation and data analysis show that domain based pathway enrichment analysis is a very sensitive method to test for enrichment of pathways in sparsely annotated lists of genes. An R based software package domainsignatures, to routinely perform this analysis on the results of high-throughput screening, is available via Bioconductor.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Simulation work flow. For the simulation study we sampled increasing numbers of genes from a given pathway and the same number of random genes. Genes that were present in both the pathway and the gene lists were removed from the pathway definition. Interpro domain signatures were constructed for both gene lists and for the pathway and measures of significance between gene lists and pathway were computed. This procedure was repeated 1,000 times and the resulting similarty measures were recorded.
Figure 2
Figure 2
Pathway classification. Simulated classification of KEGG pathways based on a binomial similarity measure. a) Single pathway (hsa04650, natural killer cell mediated cytotoxicity). The red curve shows the similarity to the pathway for increasing numbers of sampled pathway genes. The blue curve shows the similarity to the pathway for increasing numbers of random genes. Each point comprises an average from 1000 independent samples and the sampling variance indicated by the error bars is negligible. b) Results for all 181 KEGG pathways. The boxplots show AUC values indicating the amount of separation between 1,000 random sampled gene lists and lists containing pathway genes for all pathways, again sampling increasing numbers of genes.
Figure 3
Figure 3
Pathway classification with noisy data. Simulated classification of KEGG pathways after addition of noise in the form of random genes that are not assigned to the respective pathways. The number of noise genes was fixed to 50. The similiarity measures and graphs are produced similar as for Figure 2. a) single pathway b) all 181 KEGG pathways.
Figure 4
Figure 4
Sensitivity and specificity. Expected sensitivity (left) and specificity (right) of our method estimated through simulation of 100 sampled gene lists containing genes of varying numbers of KEGG pathways. 5 genes were sampled from each of 3, 5 or 10 different pathways, respectively, plus additional 50 random noise genes were added each time. The sensitivity and specificity was above 90% in all cases, the sampling variance indicated by the error bars is negligible.

Similar articles

Cited by

References

    1. Beissbarth T. Interpreting experimental results using gene ontologies. Methods Enzymol. 2006;411:340–352. doi: 10.1016/S0076-6879(06)11018-6. - DOI - PubMed
    1. Beissbarth T, Speed TP. GOstat: find statistically overrepresented Gene Ontologies within a group of genes. Bioinformatics. 2004;20:1464–1465. doi: 10.1093/bioinformatics/bth088. - DOI - PubMed
    1. Alexa A, Rahnenfuehrer J, Lengauer T. Improved scoring of functional groups from gene expression data by decorrelating GO graph structure. Bioinformatics. 2006;22:1600–1607. doi: 10.1093/bioinformatics/btl140. - DOI - PubMed
    1. Manoli T, Gretz N, Groene HJ, Kenzelmann M, Eils R, Brors B. Group testing for pathway analysis improves comparability of different microarray datasets. Bioinformatics. 2006;22:2500–2506. doi: 10.1093/bioinformatics/btl424. - DOI - PubMed
    1. Al-Shahrour F, Minguez P, Tárraga J, Medina I, Alloza E, Montaner D, Dopazo J. FatiGO +: a functional profiling tool for genomic data. Integration of functional annotation, regulatory motifs and interaction data with microarray experiments. Nucleic Acids Res. 2007;35:W91–W96. doi: 10.1093/nar/gkm260. - DOI - PMC - PubMed

Publication types

MeSH terms