Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Jul;37(Web Server issue):W84-9.
doi: 10.1093/nar/gkp373. Epub 2009 May 12.

Berkeley PHOG: PhyloFacts orthology group prediction web server

Affiliations

Berkeley PHOG: PhyloFacts orthology group prediction web server

Ruchira S Datta et al. Nucleic Acids Res. 2009 Jul.

Abstract

Ortholog detection is essential in functional annotation of genomes, with applications to phylogenetic tree construction, prediction of protein-protein interaction and other bioinformatics tasks. We present here the PHOG web server employing a novel algorithm to identify orthologs based on phylogenetic analysis. Results on a benchmark dataset from the TreeFam-A manually curated orthology database show that PHOG provides a combination of high recall and precision competitive with both InParanoid and OrthoMCL, and allows users to target different taxonomic distances and precision levels through the use of tree-distance thresholds. For instance, OrthoMCL-DB achieved 76% recall and 66% precision on this dataset; at a slightly higher precision (68%) PHOG achieves 10% higher recall (86%). InParanoid achieved 87% recall at 24% precision on this dataset, while a PHOG variant designed for high recall achieves 88% recall at 61% precision, increasing precision by 37% over InParanoid. PHOG is based on pre-computed trees in the PhyloFacts resource, and contains over 366 K orthology groups with a minimum of three species. Predicted orthologs are linked to GO annotations, pathway information and biological literature. The PHOG web server is available at http://phylofacts.berkeley.edu/orthologs/.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Results of orthology prediction methods assessed on a benchmark dataset from the TreeFam-A resource. Performance was evaluated on 100 human proteins selected from the TreeFam-A manually curated orthology database, with orthologs to each human protein from mouse, zebrafish and fruit fly. Methods evaluated include several PHOG variants, OrthoMCL-DB, InParanoid and SCI-PHY. PHOG-S represents super-orthology predictions, PHOG-O represents standard orthology predictions and PHOG-T represents the tree-distance thresholded variants. PHOG-T variants PHOG-T(M), PHOG-T(Z) and PHOG-T(F) correspond to tree-distance thresholds selected for optimal performance on this dataset for mouse, zebrafish and fruit fly, respectively. Tree distance thresholds were 0.09375 (mouse), 0.296875 (zebrafish) and 0.9375 (fruit fly). SCI-PHY uses hierarchical clustering and encoding cost measures to define functional subtypes and is included for comparison. Recall measures the fraction of TreeFam-A orthologs detected by a method. Precision measures the fraction of a method's predicted orthologs that are included in TreeFam-A. A True Positive (TP) is an orthology pair included in TreeFam-A that is also predicted by a method, a False Positive (FP) is an orthology pair predicted by a method that is not included in TreeFam-A and a False Negative (FN) is a TreeFam-A ortholog that is missed by a method. Left: recall-precision curves over the entire dataset. Right: table of results for each method for individual species as well as over the entire dataset. Values in red highlight the recall and precision for species-specific threshold selections.
Figure 2.
Figure 2.
PhyloFacts ortholog identification pipeline. The input is a protein sequence, in either FASTA format (for BLAST search) or by accession. Results of a sequence accession search are displayed in an Orthology Report including a table of all PHOGs containing the query (F) followed by a table displaying the sequences contained in these PHOGs (G). Links in the columns labelled PhyloFacts Orthology Group retrieve the corresponding PHOG report (E). BLAST results are displayed in an initial table of results (not shown); users would then select one of the sequences in the table, to retrieve the Orthology Report for their selected sequence. (A) Protein sequence query. In this example, the query sequence consists of two evolutionarily conserved domains—an N-terminal Ig domain (pink) followed by a transmembrane helix and and a C-terminal Toll Interleukin Receptor (TIR) domain (blue). (B–D) PhyloFacts trees containing the query sequence are identified, and orthologs are extracted from the orthology group for the sequence (indicated by red subtrees). In this example, the sequence is contained in three PhyloFacts trees. The tree shown in B corresponds to sequences sharing the same overall domain architecture (global homologs). The trees shown in C and D contain sequences that share local (partial) homology along a single domain; the tree in C contains sequences having an Ig domain and the tree in D contains sequences having a TIR domain. (Note that the taxonomic distributions of these PHOGs differ, corresponding to differences in orthology predictions across these domains.) (E) PHOG report—this report displays summary data for the PHOG, followed by a table listing all the orthologs in the PHOG including a link to the sequence database from which the member was drawn, the species of origin, description and links to external resources (e.g. SwissProt, KEGG and BioCyc). (F) List of PHOGs containing the query. This table contains summary data about each PHOG, including PFAM domains, GO annotations and evidence codes and taxonomic distribution. (G) Orthology report: all members of all PhyloFacts orthology groups containing the query are gathered and presented in a table. Note that some orthologs to the query will belong to more than one PHOG (i.e. containing both the ortholog and the query); the column ‘PhyloFacts Orthology Group’ provides a link to the most informative PHOG for each sequence as well as to the PhyloFacts book containing that PHOG. GO annotations and evidence codes, PFAM domains and links to external resources (e.g. SwissProt, KEGG, BioCyc and GO) are also provided. These data are also overlaid on the phylogenetic tree for the PHOG as well as for the family tree from which the PHOG was drawn, and can be viewed using the PhyloScope tree viewer.

References

    1. Sjölander K. Phylogenomic inference of protein molecular function: advances and challenges. Bioinformatics. 2004;20:170–179. - PubMed
    1. Brown D, Sjölander K. Functional classification using phylogenomic inference. PLoS Comput. Biol. 2006;2:e77. - PMC - PubMed
    1. Fitch WM. Distinguishing homologous from analogous proteins. Syst. Zool. 1970;19:99–113. - PubMed
    1. Eisen JA. Phylogenomics: improving functional predictions for uncharacterized genes by evolutionary analysis. Genome Res. 1998;8:163–167. - PubMed
    1. Friedberg I. Automated protein function prediction—the genomic challenge. Brief Bioinform. 2006;7:225–242. - PubMed

Publication types