Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Feb 27:10:73.
doi: 10.1186/1471-2105-10-73.

Disease candidate gene identification and prioritization using protein interaction networks

Affiliations

Disease candidate gene identification and prioritization using protein interaction networks

Jing Chen et al. BMC Bioinformatics. .

Erratum in

  • BMC Bioinformatics. 2009;10:406

Abstract

Background: Although most of the current disease candidate gene identification and prioritization methods depend on functional annotations, the coverage of the gene functional annotations is a limiting factor. In the current study, we describe a candidate gene prioritization method that is entirely based on protein-protein interaction network (PPIN) analyses.

Results: For the first time, extended versions of the PageRank and HITS algorithms, and the K-Step Markov method are applied to prioritize disease candidate genes in a training-test schema. Using a list of known disease-related genes from our earlier study as a training set ("seeds"), and the rest of the known genes as a test list, we perform large-scale cross validation to rank the candidate genes and also evaluate and compare the performance of our approach. Under appropriate settings - for example, a back probability of 0.3 for PageRank with Priors and HITS with Priors, and step size 6 for K-Step Markov method - the three methods achieved a comparable AUC value, suggesting a similar performance.

Conclusion: Even though network-based methods are generally not as effective as integrated functional annotation-based methods for disease candidate gene prioritization, in a one-to-one comparison, PPIN-based candidate gene prioritization performs better than all other gene features or annotations. Additionally, we demonstrate that methods used for studying both social and Web networks can be successfully used for disease candidate gene prioritization.

PubMed Disclaimer

Figures

Figure 1
Figure 1
ROC curves from cross validations. This figure shows the representative ROC curves using PageRank with Priors with back probability 0.01, 0.05, 0.1, 0.3 and 0.5, and HITS with Priors with back probability 0.3 and 0.5. The random curve was derived from prioritization of the random training set using the PageRank with Prior method with back probability 0.3.
Figure 2
Figure 2
ROC curves from cross validations. This figure shows the representative ROC curves using the K-Step Markov method with K = 1, 2, 4, and 6. The random curve was derived from prioritization of the random training set using the PageRank with Prior method with back probability 0.3.
Figure 3
Figure 3
Plots of AUC with different parameter values. The left panel shows the AUC values of PageRank with Priors with back probability varied from 0.01 to 0.9. The right panel shows the AUC values of the K-Step Markov method with random walk length varied from 1 to 6. The vertical bars indicate the standard deviations.
Figure 4
Figure 4
Prioritized candidate genes of cardiac septal defects using both functional annotation- and PPIN- based methods. Panel A shows the sub-network of heart septal defect related genes comprising (i) genes associated with OMIM diseases that have the phenotype of cardiac septal defect (Training set of genes for cardiac septal defect) and their immediate interactants (Test set genes). The size of the nodes is proportional to the degree (number of edges). Panel B shows the intersection among the top 20 ranked cardiac septal defect candidate genes using functional annotation- and PPIN- based methods. Functional annotation-based prioritization was done using ToppGene server. For PPIN-based methods K-Step Markov, Hits with Priors, and PageRank with Priors was used. Panel C shows the top 20 ranked cardiac septal defect genes (generated using PPIN- and functional annotation- based methods) along with their connectivity to training set genes (based on protein-protein interactions).

References

    1. Adie EA, Adams RR, Evans KL, Porteous DJ, Pickard BS. Speeding disease gene discovery by sequence based candidate prioritization. BMC Bioinformatics. 2005;6:55. - PMC - PubMed
    1. Adie EA, Adams RR, Evans KL, Porteous DJ, Pickard BS. SUSPECTS: enabling fast and effective prioritization of positional candidates. Bioinformatics. 2006;22:773–774. - PubMed
    1. Aerts S, Lambrechts D, Maity S, Van Loo P, Coessens B, De Smet F, Tranchevent LC, De Moor B, Marynen P, Hassan B, et al. Gene prioritization through genomic data fusion. Nat Biotechnol. 2006;24:537–544. - PubMed
    1. Tiffin N, Adie E, Turner F, Brunner HG, van Driel MA, Oti M, Lopez-Bigas N, Ouzounis C, Perez-Iratxeta C, Andrade-Navarro MA, et al. Computational disease gene identification: a concert of methods prioritizes type 2 diabetes and obesity candidate genes. Nucleic Acids Res. 2006;34:3067–3081. - PMC - PubMed
    1. Turner FS, Clutterbuck DR, Semple CA. POCUS: mining genomic sequence annotation to predict disease genes. Genome Biol. 2003;4:R75. - PMC - PubMed

Publication types

LinkOut - more resources