Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 May 7;3(5):e2136.
doi: 10.1371/journal.pone.0002136.

De-orphaning the structural proteome through reciprocal comparison of evolutionarily important structural features

Affiliations

De-orphaning the structural proteome through reciprocal comparison of evolutionarily important structural features

R Matthew Ward et al. PLoS One. .

Abstract

Function prediction frequently relies on comparing genes or gene products to search for relevant similarities. Because the number of protein structures with unknown function is mushrooming, however, we asked here whether such comparisons could be improved by focusing narrowly on the key functional features of protein structures, as defined by the Evolutionary Trace (ET). Therefore a series of algorithms was built to (a) extract local motifs (3D templates) from protein structures based on ET ranking of residue importance; (b) to assess their geometric and evolutionary similarity to other structures; and (c) to transfer enzyme annotation whenever a plurality was reached across matches. Whereas a prototype had only been 80% accurate and was not scalable, here a speedy new matching algorithm enabled large-scale searches for reciprocal matches and thus raised annotation specificity to 100% in both positive and negative controls of 49 enzymes and 50 non-enzymes, respectively-in one case even identifying an annotation error-while maintaining sensitivity ( approximately 60%). Critically, this Evolutionary Trace Annotation (ETA) pipeline requires no prior knowledge of functional mechanisms. It could thus be applied in a large-scale retrospective study of 1218 structural genomics enzymes and reached 92% accuracy. Likewise, it was applied to all 2935 unannotated structural genomics proteins and predicted enzymatic functions in 320 cases: 258 on first pass and 62 more on second pass. Controls and initial analyses suggest that these predictions are reliable. Thus the large-scale evolutionary integration of sequence-structure-function data, here through reciprocal identification of local, functionally important structural features, may contribute significantly to de-orphaning the structural proteome.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Matching Strategies.
Schematic overview of the three matching strategies. 1a, one-to-many matching; 1b, many-to-one matching; 1c, the two superimposed. Lines represent template searches; arrows, matches; bold lines, correct matches; other lines, incorrect matches; X's, no match. Purple spheres are residues in both the source and target template and match; red spheres, residues in the query template and target match; blue spheres, residues in the target template and query match.
Figure 2
Figure 2. Example of Evolutionary Trace Annotation.
Illustration of a source protein (2a, PDB 1yvw, chain A), its ET cluster (yellow), residues chosen as a template from that cluster (red), and the Cα atoms which define the geometry of the template (blue); and its functionally relevant match in a target protein (2b, PDB 2a7w, chain A), with corresponding match residues (red) and Cα atoms (blue).
Figure 3
Figure 3. Matches to the PSI Test Set.
The number of true and false matches to the PSI test set before and after reciprocal filtering is shown. The top ovals show the number of true and false matches found by each method alone, with the number of query proteins in parenthesis, and the true/false enrichment ratios below. The bottom ovals show the same data with reciprocity imposed, taking the intersection of the matches found by each method.
Figure 4
Figure 4. ETA and Sequence Identity.
ETA performance on the PSI Test Set is shown, removing matches above a sequence identity cutoff to explore the importance of matches with varying levels of similarity. Sensitivity (black diamonds) is the percentage of the 49 proteins for which ETA predicts a correct function; accuracy (blue circles) is the percentage of these predictions that are correct.
Figure 5
Figure 5. EC YihX and Matches.
Comparison of structures and template/match residues for query 2b0c, chain A (4a and 4b, orange), from the Toronto Set versus targets 1×42, chain A (4a, green), and 1zrn (4b, yellow). Purple spheres, residues in both the source and target template and match; red spheres, residues in only the query template and target match; blue spheres, residues in only the target template and query match.
Figure 6
Figure 6. EC Classes of ETA Predictions.
Distribution of 320 reciprocal ETA annotations among the first digit EC classes, including both first and second order predictions.
Figure 7
Figure 7. Examples of ETA Predictions.
Reciprocal matches contributing to three novel ETA function predictions, with the query in orange and the target in green, and template/match residues using the scheme in Figure 5. 7a, query 1jrk, chain A, vs. target 1vhz, chain B; 7b, 1wwz, chain B, vs. 1y9w, chain A; 7c, 2fl4, chain A, vs. 1wwz, chain B; 7d, 1xkq, chain A, vs. 1jtv, chain A.

Similar articles

Cited by

References

    1. Chandonia JM, Brenner SE. The impact of structural genomics: expectations and outcomes. Science. 2006;311:347–351. - PubMed
    1. Brenner SE. A tour of structural genomics. Nat Rev Genet. 2001;2:801–809. - PubMed
    1. Burley SK. An overview of structural genomics. Nat Struct Biol. 2000;7(Suppl):932–934. - PubMed
    1. Leulliot N, Tresaugues L, Bremang M, Sorel I, Ulryck N, et al. High-throughput crystal-optimization strategies in the South Paris Yeast Structural Genomics Project: one size fits all? Acta Crystallogr D Biol Crystallogr. 2005;61:664–670. - PubMed
    1. Baker D, Sali A. Protein structure prediction and structural genomics. Science. 2001;294:93–96. - PubMed

Publication types