Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 Jan;16(1):4-13.
doi: 10.1110/ps.062506407.

Analysis and prediction of functionally important sites in proteins

Affiliations

Analysis and prediction of functionally important sites in proteins

Saikat Chakrabarti et al. Protein Sci. 2007 Jan.

Abstract

The rapidly increasing volume of sequence and structure information available for proteins poses the daunting task of determining their functional importance. Computational methods can prove to be very useful in understanding and characterizing the biochemical and evolutionary information contained in this wealth of data, particularly at functionally important sites. Therefore, we perform a detailed survey of compositional and evolutionary constraints at the molecular and biological function level for a large set of known functionally important sites extracted from a wide range of protein families. We compare the degree of conservation across different functional categories and provide detailed statistical insight to decipher the varying evolutionary constraints at functionally important sites. The compositional and evolutionary information at functionally important sites has been compiled into a library of functional templates. We developed a module that predicts functionally important columns (FIC) of an alignment based on the detection of a significant "template match score" to a library template. Our template match score measures an alignment column's similarity to a library template and combines a term explicitly representing a column's residue composition with various evolutionary conservation scores (information content and position-specific scoring matrix-derived statistics). Our benchmarking studies show good sensitivity/specificity for the prediction of functional sites and high accuracy in attributing correct molecular function type to the predicted sites. This prediction method is based on information derived from homologous sequences and no structural information is required. Therefore, this method could be extremely useful for large-scale functional annotation.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Degree of conservation across molecular functional categories. Degree of conservation of FICs is compared across six molecular functional categories (see Table 1 and text for details) based on its median conservation score mc (a), rate of substitution Rc (b), and information content Ic (c). The central line in each box shows the median value, the upper and lower boundaries of an individual box show the upper and lower quartiles, and the vertical lines extend to a value of 1.5 times the interquartile range. Outlier values are shown outside the whiskers.
Figure 2.
Figure 2.
Degree of conservation across biological functional categories. Degree of conservation of FICs is compared across 16 biological functional categories (see Table 2 and the text for details) based on its median conservation score mc (upper panel), rate of substitution Rc (middle panel), and information content Ic (lower panel).
Figure 3.
Figure 3.
Examples of functionally important site prediction. (a) Cartoon representation of response regulator protein PleD (PDB code 1W25). Correctly predicted active sites and inhibitor binding sites are colored in cyan; functional sites missed by our prediction module are marked either by purple (active sites) or orange (inhibitor binding sites). (b) Structure of DNA (ligand) binding Phob effector Domain (PDB code 1GXP, chain A). (c) Structure of a Ran-binding protein Mog1p. Correctly predicted functional sites for ligand binding (b) and protein binding (c) are colored in cyan, whereas functional sites that are missed by our prediction module are marked by purple. (d) Structure of Human Long-[Arg3] insulin-like growth factor 1 (PDB code 3LRI, chain A, region 17–74). All predicted functionally important sites are colored in cyan, whereas correctly predicted sites are marked in stick representation.

References

    1. Aloy, P., Querol, E., Aviles, F.X., and Sternberg, M.J.E. 2001. Automated structure-based prediction of functional sites in proteins: Applications to assessing the validity of inheriting protein function from homology in genome annotation and to protein docking. J. Mol. Biol. 311: 395–408. - PubMed
    1. Anantharaman, V., Aravind, L., and Koonin, E.V. 2003. Emergence of diverse biochemical activities in evolutionarily conserved structural scaffolds of proteins. Curr. Opin. Chem. Biol. 7: 12–20. - PubMed
    1. Andrade, M.A., Casari, G., Sander, C., and Valencia, A. 1997. Classification of protein families and detection of the determinant residues with an improved self-organizing map. Biol. Cybern. 76: 441–450. - PubMed
    1. Aravind, L., Mazumder, R., Vasudevan, S., and Koonin, E.V. 2002. Trends in protein evolution inferred from sequence and structure analysis. Curr. Opin. Struct. Biol. 12: 392–399. - PubMed
    1. Armon, A., Graur, A., and Ben-Tal, N. 2001. ConSurf: An algorithmic tool for the identification of functional regions in proteins by surface mapping of phylogenetic information. J. Mol. Biol. 307: 447–463. - PubMed

Publication types

LinkOut - more resources