Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Jan 25:9:51.
doi: 10.1186/1471-2105-9-51.

The contrasting properties of conservation and correlated phylogeny in protein functional residue prediction

Affiliations

The contrasting properties of conservation and correlated phylogeny in protein functional residue prediction

Jonathan R Manning et al. BMC Bioinformatics. .

Abstract

Background: Amino acids responsible for structure, core function or specificity may be inferred from multiple protein sequence alignments where a limited set of residue types are tolerated. The rise in available protein sequences continues to increase the power of techniques based on this principle.

Results: A new algorithm, SMERFS, for predicting protein functional sites from multiple sequences alignments was compared to 14 conservation measures and to the MINER algorithm. Validation was performed on an automatically generated dataset of 1457 families derived from the protein interactions database SNAPPI-DB, and a smaller manually curated set of 148 families. The best performing measure overall was Williamson property entropy, with ROC0.1 scores of 0.0087 and 0.0114 for domain and small molecule contact prediction, respectively. The Lancet method performed worse than random on protein-protein interaction site prediction (ROC0.1 score of 0.0008). The SMERFS algorithm gave similar accuracy to the phylogenetic tree-based MINER algorithm but was superior to Williamson in prediction of non-catalytic transient complex interfaces. SMERFS predicts sites that are significantly more solvent accessible compared to Williamson.

Conclusion: Williamson property entropy is the the best performing of 14 conservation measures examined. The difference in performance of SMERFS relative to Williamson in manually defined complexes was dependent on complex type. The best choice of analysis method is therefore dependent on the system of interest. Additional computation employed by Miner in calculation of phylogenetic trees did not produce improved results over SMERFS. SMERFS performance was improved by use of windows over alignment columns, illustrating the necessity of considering the local environment of positions when assessing their functional significance.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Fragment of an Example Alignment. Illustration of the difference between highly conserved positions likely to be responsible for core structure and/or function and specificity-defining (SD) positions of a multiple sequence alignment. Columns 1 and 2 illustrate positions crucial in all family members. Column 3 shows a similar, though less stringent global pattern of conservation. Column 4 in contrast represents an SD position, where only a single amino acid is tolerated by each subfamily. Column 5 represents a non-conserved position for comparison.
Figure 2
Figure 2
Partial ROC Plots from the BW05 Data. Partial ROC plots illustrating the difference in method performance in each of the four categories of interface that comprise the Bradford data set. The straight line shown represents the ratio of TP rate to FP rate expected from a randomly generated measure.
Figure 3
Figure 3
Surface Accessibility of Domain Residues. Illustration of surface accessibility of positions associated with functional residue prediction in the four different interface subtypes of the BW05 validation data [37]. Distributions of relative solvent accessibility (RSA) are shown in divided into 3 bins: buried (B, 0 ≤ RSA < 5), partly buried (PB, 5 ≤ RSA < 25), and accessible (A, 25 ≤ RSA < 100). Of the three columns of panels, the far left (labeled 'Interacting Positions' represents all positions found interacting in the BW05 dataset. Centre panels illustrate positions over the domain as a whole, while right-hand panels are derived from SMERFS or Williamson predictions. The lighter 'caps' on bars in the 'Predictions' column represent the portion that corresponds to interactions in the BW05 set. Rows describe the sub-types of hetero-obligates, homo-obligates, enzyme-inhibitors and non-enzyme inhibitor transient (NEIT).
Figure 4
Figure 4
Complex Structure of PDB Structure 1BW0, Representing Pfam Family PF00155. Illustration of the dimer structure for Pfam family PF00155 (amino transferase class I and II) in the Trypanosoma Cruzi structure (PDB code 1BM0, [38]). The complex comprises 2 chains, chain A is shown in dark gray, chain B in lighter grey. The two PLP cofactor molecules are shown in orange, and each has contacts with both chains.
Figure 5
Figure 5
PF00155 seed alignment excerpt with SMERFS-predicted positions shown in red.
Figure 6
Figure 6
Ligand-binding Positions of Tyrosine Aminotransferase of Trypanosoma Cruzi. One chain of the crystal structure of tyrosine aminotransferase from Trypanosoma Cruzi (PDB code 1BW0). Results of a conservation-based measure (Williamson, in blue) are shown compared to the phylogeny-based SMERFS (in red). Positions predicted by both techniques are shown in green, the PLP cofactor in orange. Protein regions in stick representation and labelled are those important for cofactor binding, as described in the text.
Figure 7
Figure 7
Structure as Figure 6, highlighting the domain-domain interface in pink.
Figure 8
Figure 8
Illustration of the SMERFS Algorithm. Illustration of the SMERFS algorithm, showing Pfam family PF03120 with a trace resulting from SMERFS run with a window size of 7. Red highlighting on the alignment shows the known locations of interactions with other domains. See text for details.

References

    1. Genome Pages at the EBI http://www.ebi.ac.uk/genomes/
    1. Do JH, Choi DK. Computational approaches to gene prediction. J Microbiol. 2006;44:137–144. - PubMed
    1. Martin DMA, Berriman M, Barton GJ. GOtcha: a new method for prediction of protein function assessed by the annotation of seven genomes. BMC Bioinformatics. 2004;5:178. doi: 10.1186/1471-2105-5-178. - DOI - PMC - PubMed
    1. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000;25:25–29. doi: 10.1038/75556. - DOI - PMC - PubMed
    1. Kalinina OV, Novichkov PS, Mironov AA, Gelfand MS, Rakhmaninova AB. SDPpred: a tool for prediction of amino acid residues thatdetermine differences in functional specificity of homologousproteins. Nucleic Acids Res. 2004:W424–8. doi: 10.1093/nar/gkh391. [1362-4962 Journal Article]. - DOI - PMC - PubMed

Publication types

LinkOut - more resources