Prediction and experimental validation of enzyme substrate specificity in protein structures

Shivas R Amin¹, Serkan Erdin, R Matthew Ward, Rhonald C Lua, Olivier Lichtarge

Affiliations

PMID: 24145433
PMCID: PMC3831482
DOI: 10.1073/pnas.1305162110

Prediction and experimental validation of enzyme substrate specificity in protein structures

Shivas R Amin et al. Proc Natl Acad Sci U S A. 2013.

. 2013 Nov 5;110(45):E4195-202.

doi: 10.1073/pnas.1305162110. Epub 2013 Oct 21.

Authors

Shivas R Amin¹, Serkan Erdin, R Matthew Ward, Rhonald C Lua, Olivier Lichtarge

Affiliation

¹ Department of Molecular and Human Genetics and Computational and Integrative Biomedical Research Center, Baylor College of Medicine, Houston, TX 77030.

PMID: 24145433
PMCID: PMC3831482
DOI: 10.1073/pnas.1305162110

Abstract

Structural Genomics aims to elucidate protein structures to identify their functions. Unfortunately, the variation of just a few residues can be enough to alter activity or binding specificity and limit the functional resolution of annotations based on sequence and structure; in enzymes, substrates are especially difficult to predict. Here, large-scale controls and direct experiments show that the local similarity of five or six residues selected because they are evolutionarily important and on the protein surface can suffice to identify an enzyme activity and substrate. A motif of five residues predicted that a previously uncharacterized Silicibacter sp. protein was a carboxylesterase for short fatty acyl chains, similar to hormone-sensitive-lipase-like proteins that share less than 20% sequence identity. Assays and directed mutations confirmed this activity and showed that the motif was essential for catalysis and substrate specificity. We conclude that evolutionary and structural information may be combined on a Structural Genomics scale to create motifs of mixed catalytic and noncatalytic residues that identify enzyme activity and substrate specificity.

Keywords: evolutionary trace; function annotation; protein function; structural motif.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

**Fig. 1.**
ETA accurately determines substrate specificity. (A) The ET algorithm is applied to a protein from *Sulfolobus tokadaii* strain 7 (green, PDB ID code 2eer, chain A) to identify evolutionarily important residues. A cluster of 10 or more important residues is identified and a Template Picker algorithm further selects five or six residues to act as a template that is used to probe a target library of proteins with known functions. Paired-distance matching algorithm identifies regions in protein structures in the target library that are similar to the template. Found matches are next passed to the SVM, which identifies significant matches based on geometric and evolutionary similarities. ETA repeats all these steps reciprocally, generating templates from target structures and searching for matches in the query protein. Following this protocol, ETA suggests four matches: alcohol dehydrogenase from *Saccharomyces cerevisae* (blue left, PDB ID code 2hcy), alcohol dehydrogenase from *S. solfataricus* (blue middle, PDB ID code 1r37), human class II alcohol dehydrogenase (blue right, PDB ID code 3cos), and NADP(H)-dependent cinnamyl alcohol dehydrogenase from *S. cerevisae* (red, PDB ID code 1piw) to the query protein. (B) The most seen function among matches, alcohol dehydrogenase activity (EC 1.1.1.1), is identified with high confidence with a confidence value of 1.125 as calculated in the box. (C) Comparison of PPV versus confidence score binned at <1, =1, and >1 for both six-residue templates (*Left*) and five-residue templates (*Right*) when considering only matches of <30% sequence identity. For more detail, see Fig. S1. (D) Comparison of PPV when predictions are made using ETA or the closest structural match (TM-align). Horizontal axis shows the maximum sequence identity of matches for proteins depicted in corresponding bars; the vertical axis is the PPV for each bin range.

**Fig. 2.**
Noncatalytic residues are prevalent in structurally invariant ETA templates. (A) Comparison of log propensities of ETA six-residue templates and known catalytic residues from MACiE database 3.0. ETA templates use glycine and proline residues at higher propensity than they appear in catalytic sites (Pearson coefficient = 0.58 considering all residues; 0.91 when ignoring G and P). (B) The rmsd for structural alignments for ETA matches, binned according to sequence identity. Alignments were generated using ETA templates and the entire structures using all atoms (lovoalign) and only the alpha carbons (TM-Align) for all matches. Negative control templates were also made using clusters of evolutionarily unimportant residues and aligned (alpha carbon only).

**Fig. 3.**
Using evolution as a guide, ETA identifies structural homologs that perform the same function. (A) The closest structural match to the human zeta-crystallin (blue, PDB ID code 1yb5, chain A) is an alcohol dehydrogenase from *S. solfataricus* (red, PDB: 1r37; chain B) with an rmsd of 1.99A, however they are functionally dissimilar. Conversely, ETA correctly matches the zeta-crystallin to *E. coli* quinone oxidoreductase (green, PDB ID code 1qor, chain B) despite a larger global rmsd value of 2.25 Å. (B) Comparison of the pretemplate residue clusters (alpha carbons) identified by ETA. The residue cluster for the *E. coli* quinone oxidoreductase more closely matches the cluster of human zeta-crystallin than *S. solfataricus* alcohol dehydrogenase. The *S. solfataricus* alcohol dehydrogenase coordinates zinc (orange), which leads to different substrate specificity despite the similarity in global topology. The cluster alignments are also shown in sequence form in A next to the target structures, where one dot signifies aligned residues and two dots signify an rmsd of less than 0.5 Å. (C and D) ETA alpha-carbon templates for zeta-crystallin (blue), quinone oxidoreductase (green), and alcohol dehydrogenase (red) represented as spheres. (E) Table of template residues with ET coverage values, where values closer to 0 are evolutionarily important and values closer to 100 are unimportant. The yellow highlights represent residues where the zeta-crystallin/alcohol dehyrodenase match site differ >5%.

**Fig. 4.**
ETA template residues accurately identify substrate specificity and are necessary for function. (A) Validation of high-confidence predictions of enzymatic activity for two uncharacterized Structural Genomics proteins using crude lysate preps. (*Left*) dhaf_2064 has significantly more myo-inositol dehydrogenase activity than lysates lacking the protein (empty vector) and lysates containing the dhaf_2064 E95A template mutant (P = 0.005, n = 6). (*Right*) tm1040_2492 has significantly more carboxylesterase activity compared with control lysates lacking the protein (P = 0.0005, n = 6). (B) ETA matching of tm1040_2492 (PDB ID code 2pbl, chain C) to three carboxylesterases (EC 3.1.1.1): EstE2 (PDB ID code 2hm7, chain A), EstE1 (PDB ID code 2c7b, chain B), and AFEST (PDB ID code 1jji, chain D). ETA did not match tm1040_2492 to Lip1 (PDB ID code 1trh, chain A), a lipase (EC 3.1.1.3). (C) Structural alignment of tm1040_2492 and the three ETA matches; labels correspond to residue numbers in tm1040_2492. (D) Structural alignment of tm1040_2492 and to Lip1 shows that the Proline residue at position 104 does not have a reciprocal cognate residue in Lip1 (black arrow). (E) Dependence of catalytic activity on carbon chain length of substrate. tm1040_2492 only catalyzes hydrolysis when substrates have ≤10 carbon atoms in the fatty acid chain. (F) Specific activity of WT tm1040_2492 and template and control mutants toward 4-nitrophenyl acetate (C2 in D). All template mutants have significantly less carboxylesterase activity compared with wild-type and control mutants (all P values ≤ 0.005, n = 6). Additionally, the W73F mutation had significantly more activity than the W73A mutation (P values < 0.005, n = 6). All error bars represent SD.

See this image and copyright information in PMC

References

1. Lee D, Redfern O, Orengo C. Predicting protein function from sequence and structure. Nat Rev Mol Cell Biol. 2007;8(12):995–1005. - PubMed
1. Loewenstein Y, et al. Protein function annotation by homology-based inference. Genome Biol. 2009;10(2):207. - PMC - PubMed
1. Erdin S, Lisewski AM, Lichtarge O. Protein function prediction: Towards integration of similarity metrics. Curr Opin Struct Biol. 2011;21(2):180–188. - PMC - PubMed
1. Rentzsch R, Orengo CA. Protein function prediction—The power of multiplicity. Trends Biotechnol. 2009;27(4):210–219. - PubMed
1. Wilkins AD, Bachman BJ, Erdin S, Lichtarge O. The use of evolutionary patterns in protein annotation. Curr Opin Struct Biol. 2012;22(3):316–325. - PMC - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Prediction and experimental validation of enzyme substrate specificity in protein structures

Affiliation

Prediction and experimental validation of enzyme substrate specificity in protein structures

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources