The FEATURE framework for protein function annotation: modeling new functions, improving performance, and extending to novel applications

Inbal Halperin¹, Dariya S Glazer, Shirley Wu, Russ B Altman

Affiliations

PMID: 18831785
PMCID: PMC2559884
DOI: 10.1186/1471-2164-9-S2-S2

The FEATURE framework for protein function annotation: modeling new functions, improving performance, and extending to novel applications

Inbal Halperin et al. BMC Genomics. 2008.

. 2008 Sep 16;9 Suppl 2(Suppl 2):S2.

doi: 10.1186/1471-2164-9-S2-S2.

Authors

Inbal Halperin¹, Dariya S Glazer, Shirley Wu, Russ B Altman

Affiliation

¹ Department of Genetics, 318 Campus Drive, Clark Center S240, Stanford, CA 94305, USA. inbal@helix.stanford.edu

PMID: 18831785
PMCID: PMC2559884
DOI: 10.1186/1471-2164-9-S2-S2

Abstract

Structural genomics efforts contribute new protein structures that often lack significant sequence and fold similarity to known proteins. Traditional sequence and structure-based methods may not be sufficient to annotate the molecular functions of these structures. Techniques that combine structural and functional modeling can be valuable for functional annotation. FEATURE is a flexible framework for modeling and recognition of functional sites in macromolecular structures. Here, we present an overview of the main components of the FEATURE framework, and describe the recent developments in its use. These include automating training sets selection to increase functional coverage, coupling FEATURE to structural diversity generating methods such as molecular dynamics simulations and loop modeling methods to improve performance, and using FEATURE in large-scale modeling and structure determination efforts.

PubMed Disclaimer

Figures

**Figure 1**
**Simplified example for building a FEATURE model**. A. An example of a positive site (left) and negative site (right), and their respective microenvironments. Properties are calculated in concentric spherical shells centered on each site (star symbol). B. FEATURE vectors calculated from the images in A, with oxygen atom count being the first property, and carbon atom count the second. The vectors are divided by shell for clarity. C. An example of a visualized FEATURE model is shown, based on the FEATURE vectors in B, and images in A. In Shell 2, oxygen atoms are more abundant in the positive site (5 counts) than in the negative site (1 count) and so oxygen atom count is considered a significantly enriched property in Shell 2 of the model. In contrast, carbon atom count is less abundant in the positive site (0 counts) compared to the negative site (8 counts), so carbon atom count is considered a significantly depleted property in Shell 2 of the model. In Shell 3, both the positive and the negative sites have 1 oxygen atom, so the model contains no significant difference for oxygen atom count in Shell 3.

**Figure 2**
**FEATURE framework overview**. The outline of the steps necessary to predict a possible function for a protein is illustrated. In order to build a FEATURE model, one must first define the function of interest and create positive and negative training sets from the appropriate data sources. Then, the model is trained and evaluated on the training sets. The validated model can be used for function prediction. Certain steps in the outline, such as extracting training sets and model building are straightforward, as described in section "An overview of the FEATURE system". Other steps, such as determination of data sources for training sites and application of models, are more flexible. For example, training sites may be derived manually or automatically selected using annotated hetero-groups or sequence motifs. In addition, the resulting models can be applied towards static structures from the PDB or structure prediction decoys, or for dynamic function prediction over ensembles of structures generated using molecular dynamics simulation.

**Figure 3**
**Illustration of the potential value of combining FEATURE models**. A. An ATP binding pocket in PDB structure 1CSN. Enlarged are N6 (blue) and PG (yellow) atoms in ATP. B. Parts of the molecule considered by a putative FEATURE model centered on N6 with shells out to 7.5 Å. Such a model might have poor ability to separate positive sites and negative sites, as shown in the histogram on the right with substantial overlap of (red) positive sites and (blue) negative sites. C. Parts of the molecule considered by a putative FEATURE model centred on PG with shells out to 7.5 Å. Again, such a model might have poor discriminating ability, as shown in the score distributions on the right for (red) positive sites and (blue) negative sites. D. Parts of the molecule considered by an analysis which combines the two marginal models in B and C. By evaluating hits to multiple models along with appropriate distance constraints, it is possible to achieve better combined performance than either single model alone, as show in the putative plot on the right.

**Figure 4**
**NMR ensemble scanning results for PDB structure** 2B1O. 2B1O is a structure of a protein which is known to bind calcium (Ca²⁺). The NMR ensemble for 2B1O contains different conformations of the structure, some of which show different proclivities for binding Ca²⁺. A shows 10 NMR generated structures for one of the known Ca²⁺binding loops, superimposed to minimize RMSD; B shows loops that FEATURE does not identify as Ca²⁺binding, corresponding to NMR models 1, 3, 4, 5, 6, and 10; and C shows loops that FEATURE does identify as Ca²⁺binding, corresponding to NMR models 2, 7, 8, and 9. In B and C, sidechains in the vicinity of the FEATURE hits are shown for the highest scoring NMR model (score ~39 for B and ~64 for C). In C, one of the hits that scored over the model threshold of 50 is shown as a yellow ball. Notice the differences in the conformations between side chains in B and C: the entire loop is wider in C, and coordinating oxygens form a ring around the hit, while in B they are more scattered. There is also a difference in the conformation of phenylalanine ring, which essentially blocks the Ca²⁺binding spot in B but is rotated away from the site to allow possible Ca²⁺binding in C.

See this image and copyright information in PMC

References

1. Altschul S, Madden T, Schaffer A, Zhang J, Zhang Z, Miller W, Lipman D. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. doi: 10.1093/nar/25.17.3389. - DOI - PMC - PubMed
1. Finn RD, Mistry J, Schuster-Bockler B, Griffiths-Jones S, Hollich V, Lassmann T, Moxon S, Marshall M, Khanna A, Durbin R, Eddy SR, Sonnhammer EL, Bateman A. Pfam: clans, web tools, and services. Nucleic Acids Res. 2006;34:D247–D251. doi: 10.1093/nar/gkj149. - DOI - PMC - PubMed
1. Hulo N, Bairoch A, Bulliard V, Cerutti L, De Castro E, Langendijk-Genevaux PS, Pagni M, Sigrist CJ. The PROSITE database. Nucleic Acids Res. 2006;34:D227–D230. doi: 10.1093/nar/gkj063. - DOI - PMC - PubMed
1. Marsden RL, Lewis TA, Orengo CA. Towards a comprehensive structural coverage of completed genomes: a structural genomics viewpoint. BMC Bioinformatics. 2007;8 - PMC - PubMed
1. Chandonia J-M, Brenner SE. The impact of structural genomics: expectations and outcomes. Science. 2006;311:347–351. doi: 10.1126/science.1121018. - DOI - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

The FEATURE framework for protein function annotation: modeling new functions, improving performance, and extending to novel applications

Affiliation

The FEATURE framework for protein function annotation: modeling new functions, improving performance, and extending to novel applications

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources