Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2006;7(9):R83.
doi: 10.1186/gb-2006-7-9-r83.

PhyloFacts: an online structural phylogenomic encyclopedia for protein functional and structural classification

Affiliations

PhyloFacts: an online structural phylogenomic encyclopedia for protein functional and structural classification

Nandini Krishnamurthy et al. Genome Biol. 2006.

Abstract

The Berkeley Phylogenomics Group presents PhyloFacts, a structural phylogenomic encyclopedia containing almost 10,000 'books' for protein families and domains, with pre-calculated structural, functional and evolutionary analyses. PhyloFacts enables biologists to avoid the systematic errors associated with function prediction by homology through the integration of a variety of experimental data and bioinformatics methods in an evolutionary framework. Users can submit sequences for classification to families and functional subfamilies. PhyloFacts is available as a worldwide web resource from http://phylogenomics.berkeley.edu/phylofacts.

PubMed Disclaimer

Figures

Figure 1
Figure 1
PhyloFacts book: Voltage-gated K+ channels, Shaker/Shaw subtypes. Each book contains summary data at the top of the book page, including book type, number of sequences, number of predicted subfamilies, and taxonomic distribution. PFAM domains matching the book consensus sequence are displayed along with predicted transmembrane domains and signal peptides. Phylogenetic trees and multiple sequence alignments can be viewed or downloaded, for the family as a whole or for individual subfamilies. Predicted critical residues have been identified and are plotted on homologous PDB structures, where available (Figure 5). Clicking on 'View annotations and sequence headers' displays GO annotations and evidence codes for sequences in the family as a whole and for individual subfamilies.
Figure 2
Figure 2
PhyloFacts search results for ANDR_RAT, androgen receptor from Rattus norvegicus. Books with significant scores are displayed graphically at top, followed by various statistics about each match in a table below. The top-scoring book (red bar) represents a global homology group of Androgen receptors, which matches the entire query sequence. Examining the table below shows the Androgen receptor book has an E-value of 2.71e-162, 91% identity between the query and book consensus (based on aligned residues), and high fractional coverage of the HMM (99%). Other global homology groups retrieved include evolutionarily related Glucocorticoid and Progesterone receptors, but analysis of query coverage and percent identity shows the Androgen receptor book to provide a superior basis for annotation transfer. Other books displayed include structural domains detected in the query. Two books (for the ligand-binding domain 1kv6a and the DNA-binding domain 1dsza) were constructed for the Structure Prediction series based on SCOP domains. Subsequent construction of the specialized book series on transmembrane receptors in the human genome resulted in additional books being constructed for these domains. Scoring subfamily HMMs is enabled by selecting the 'Search subfamilies' box (second column in the spreadsheet of results, shown checked in the figure), and clicking on the 'Go' button at bottom ('Search selected books for top-scoring subfamily HMMs against query'). Clicking on the 'Go' button below 'View alignment' in the first column brings up a separate page displaying the pairwise alignment of the query and the family consensus sequence along with relevant statistics about the alignment. Clicking on the hyperlink to the book itself (in the 'PhyloFacts book' column) retrieves the webpage for the family (see example book page shown in Figure 1).
Figure 3
Figure 3
PhyloFacts whole-genome library construction pipeline. This figure represents our protocol for building global homology group protein family books. The pipeline starts with clustering a target genome into global homology groups (GHGs; sequences sharing the same overall domain structure), and proceeding through various stages of cluster expansion, multiple sequence alignment, phylogenetic tree construction, retrieval of experimental data, a variety of bioinformatics methods for predicting functional subfamilies, key residues, cellular localization, and so on, and quality control assessment.
Figure 4
Figure 4
SCI-PHY subfamilies correspond closely to conserved phylogenetic clades. Shown here is the Maximum Likelihood (ML) tree and SCI-PHY subfamilies for the PhyloFacts book 'Voltage-gated K+ channels, Shaker/Shaw subtypes'. A branch of the ML tree is displayed, labeled with the corresponding SCI-PHY subfamilies. Subtrees containing sequences from a single subfamily are colored to show the correspondence between the SCI-PHY subfamilies and the ML tree.
Figure 5
Figure 5
Key residue prediction using SCI-PHY subfamily-specific and family-wide conservation patterns. Shown above is the PDB structure for the Pyrococcus furiosus Argonaute protein (PDB structure 1Z26A), from the PhyloFacts book Argonaute III (Archaea-Eukarya). The structure has been colored to predict functional residues. Residues colored yellow are conserved within both subfamilies and across the family as a whole. Positions conserved only within individual subfamilies but not across the family are colored dark blue. Positions having sufficient conservation across the family, but potentially variable within one or more subfamilies are colored light blue. These conservation patterns are predicted for each book in the PhyloFacts resource; where homologous PDB structures can be identified, these patterns are plotted on the structure. Users can modify cutoffs for determining significance using the boxes at right. Most of the residues highlighted automatically by our conservation analyses, based on the default cutoffs, have been determined experimentally to be part of the active site [80-82] (labeled manually for this figure): R627, D628, G629, D558, Y743, H745). Y221 and W222 represent a prediction by this server. Structure viewing and interaction is enabled by the Jmol software.

References

    1. Bork P, Koonin EV. Predicting functions from protein sequences - where are the bottlenecks? Nat Genet. 1998;18:313–318. doi: 10.1038/ng0498-313. - DOI - PubMed
    1. Galperin MY, Koonin EV. Sources of systematic error in functional annotation of genomes: domain rearrangement, non-orthologous gene displacement and operon disruption. In Silico Biol. 1998;1:55–67. - PubMed
    1. Gerlt JA, Babbitt PC. Can sequence determine function? Genome Biol. 2000;1:REVIEWS0005. doi: 10.1186/gb-2000-1-5-reviews0005. - DOI - PMC - PubMed
    1. Fitch WM. Distinguishing homologous from analogous proteins. Syst Zool. 1970;19:99–113. doi: 10.2307/2412448. - DOI - PubMed
    1. Kaessmann H, Zollner S, Nekrutenko A, Li WH. Signatures of domain shuffling in the human genome. Genome Res. 2002;12:1642–1650. doi: 10.1101/gr.520702. - DOI - PMC - PubMed

Publication types

LinkOut - more resources