Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013;8(1):e54325.
doi: 10.1371/journal.pone.0054325. Epub 2013 Jan 17.

Evaluation of physical and functional protein-protein interaction prediction methods for detecting biological pathways

Affiliations

Evaluation of physical and functional protein-protein interaction prediction methods for detecting biological pathways

Vijaykumar Yogesh Muley et al. PLoS One. 2013.

Abstract

Background: Cellular activities are governed by the physical and the functional interactions among several proteins involved in various biological pathways. With the availability of sequenced genomes and high-throughput experimental data one can identify genome-wide protein-protein interactions using various computational techniques. Comparative assessments of these techniques in predicting protein interactions have been frequently reported in the literature but not their ability to elucidate a particular biological pathway.

Methods: Towards the goal of understanding the prediction capabilities of interactions among the specific biological pathway proteins, we report the analyses of 14 biological pathways of Escherichia coli catalogued in KEGG database using five protein-protein functional linkage prediction methods. These methods are phylogenetic profiling, gene neighborhood, co-presence of orthologous genes in the same gene clusters, a mirrortree variant, and expression similarity.

Conclusions: Our results reveal that the prediction of metabolic pathway protein interactions continues to be a challenging task for all methods which possibly reflect flexible/independent evolutionary histories of these proteins. These methods have predicted functional associations of proteins involved in amino acids, nucleotide, glycans and vitamins & co-factors pathways slightly better than the random performance on carbohydrate, lipid and energy metabolism. We also make similar observations for interactions involved among the environmental information processing proteins. On the contrary, genetic information processing or specialized processes such as motility related protein-protein linkages that occur in the subset of organisms are predicted with comparable accuracy. Metabolic pathways are best predicted by using neighborhood of orthologous genes whereas phyletic pattern is good enough to reconstruct central dogma pathway protein interactions. We have also shown that the effective use of a particular prediction method depends on the pathway under investigation. In case one is not focused on specific pathway, gene expression similarity method is the best option.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. A schematic representation of the approach.
A) Each arrow point towards a particular protein-protein physical/functional interaction prediction method. Gene Cluster (GC) calculates co-occurrence probability of orthologs of query proteins encoded from same gene clusters in reference genomes. Gene clusters were defined as a set of unidirectional genes within intergenic distance of 100 nucleotide bases. In given example, genes encoding orthologs of query proteins C and D co-occur in same cluster in three reference genomes, hence 3/4 is interaction score between them. Gene Neighbor (GN) method calculates interaction scores for query protein pairs based on the minimum chromosomal distance between their orthologs encoding genes in any one reference genome irrespective of gene orientation. In given example, minimum distance for proteins C and D evident in one of the reference genome. Expression Similarity (ES) method correlates expression profiles of protein coding genes in various conditions. In given plot, two genes show almost similar expression in various conditions and hence are likely to be interacting. Phylogenetic Profiling (PP) calculates interaction scores based on co-occurrence of proteins in multiple genomes. Phyletic pattern of orthologs of E and D proteins showed with colored filled circles in rows while, vertical stacking represents an individual reference genome. Black circles represents absence of ortholog otherwise presence. Genome distance-Mirrortree (GM) method compares distance matrices derived from aligned orthologs of query proteins. Prior to comparison, we correct these matrices to exclude speciation information using new approach. B) A set of 14 pathways catalogued in KEGG were used as benchmarking dataset. Protein pairs that co-occur in pathway under consideration (for example, Nucleotide metabolism highlighted with red color) were treated as positives and all other pathway protein pairs considered as negatives. C) We calculate interaction scores using above mentioned five methods for positives and negatives of each pathway as shown in table. D) We compare performance accuracy of protein-protein interaction prediction methods for each KEGG pathway using Receiver Operator Characteristics curves.
Figure 2
Figure 2. Predictive power of physical and functional protein-protein interaction prediction methods.
Each point on this plot represents specific interaction score threshold of prediction methods at which the true and false positives were counted. Inset on the plot shows reduction of false positives at higher interaction score cutoffs for GC, PP, ES and GM prediction methods. The performance of all methods is near diagonal. Even at highest score cutoff GN predicted more than 13,868 TP and 210,963 FP hence its line is not visible in the inset. Expression Similarity (ES) is the best performing method. Phylogenetic Profiling, Gene Neighbor, Gene Cluster and Genome Distance-Mirrortree are abbreviated as PP, GN, GC and GM respectively.
Figure 3
Figure 3. Prediction accuracy of physical and functional protein-protein interactions responsible for metabolism.
Each solid colored line represents Receiver Operator Characteristics (ROC) curve of methods. Gray colored dotted line represents performance of random predictor. Gene Neighbor, Gene Cluster, Expression Similarity, Phylogenetic Profiling and Genome Distance-Mirrortree are abbreviated as GN, GC, ES, PP and GM respectively. Each inset on the plot represents performance in the area of high interaction scores generated by prediction methods. Amino acid (A), Nucleotide (C), Co-factors & vitamins (E) and Glycan (H) pathways are predicted with comparable accuracy. Prediction accuracy of Carbohydrate (B), Lipid (D), Energy (F) and Non-standard amino acids (G) pathways by all methods is near random predictor. GN outperforms other methods.
Figure 4
Figure 4. Prediction accuracy of physical and functional protein-protein interactions responsible for various biological pathways.
Each solid colored line represents Receiver Operator Characteristics (ROC) curve of prediction methods. Gray colored dotted line represents performance of random predictor. Gene Neighbor, Gene Cluster, Expression Similarity, Phylogenetic Profiling and Genome Distance-Mirrortree are abbreviated as GN, GC, ES, PP and GM respectively. Translation (A), Folding, sorting & degradation (B), and Replication & repair (C) are well predicted by PP and GM. Signal transduction (D) and Membrane Transport (E) pathways are predicted randomly by all PPI prediction methods. GM performed well as compared to other methods in low false positive region (D & E Inset). PP, ES and GN elegantly predicted interactions among proteins involved in Cell motility pathway (F).

Similar articles

Cited by

References

    1. Marcotte EM (2000) Computational genetics: finding protein function by nonhomology methods. Curr Opin Struct Biol 10: 359–365. - PubMed
    1. Butland G, Peregrin-Alvarez JM, Li J, Yang W, Yang X, et al. (2005) Interaction network containing conserved and essential protein complexes in Escherichia coli. Nature 433: 531–537. - PubMed
    1. Hu P, Janga SC, Babu M, Diaz-Mejia JJ, Butland G, et al. (2009) Global functional atlas of Escherichia coli encompassing previously uncharacterized proteins. PLoS Biol 7: e96. - PMC - PubMed
    1. Myers CL, Barrett DR, Hibbs MA, Huttenhower C, Troyanskaya OG (2006) Finding function: evaluation methods for functional genomic data. BMC Genomics 7: 187. - PMC - PubMed
    1. von Mering C, Zdobnov EM, Tsoka S, Ciccarelli FD, Pereira-Leal JB, et al. (2003) Genome evolution reveals biochemical networks and functional modules. Proc Natl Acad Sci U S A 100: 15428–15433. - PMC - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources