Review

. 2014 Aug;39(8):363-71.

doi: 10.1016/j.tibs.2014.05.006. Epub 2014 Jul 2.

Leveraging structure for enzyme function prediction: methods, opportunities, and challenges

Matthew P Jacobson¹, Chakrapani Kalyanaraman², Suwen Zhao², Boxue Tian²

Affiliations

¹ Department of Pharmaceutical Chemistry, School of Pharmacy, University of California, San Francisco, CA 94158, USA; California Institute for Quantitative Biomedical Research, University of California, San Francisco, CA 94158, USA. Electronic address: matt.jacobson@ucsf.edu.
² Department of Pharmaceutical Chemistry, School of Pharmacy, University of California, San Francisco, CA 94158, USA; California Institute for Quantitative Biomedical Research, University of California, San Francisco, CA 94158, USA.

PMID: 24998033
PMCID: PMC4117707
DOI: 10.1016/j.tibs.2014.05.006

Review

Leveraging structure for enzyme function prediction: methods, opportunities, and challenges

Matthew P Jacobson et al. Trends Biochem Sci. 2014 Aug.

. 2014 Aug;39(8):363-71.

doi: 10.1016/j.tibs.2014.05.006. Epub 2014 Jul 2.

Authors

Matthew P Jacobson¹, Chakrapani Kalyanaraman², Suwen Zhao², Boxue Tian²

Affiliations

¹ Department of Pharmaceutical Chemistry, School of Pharmacy, University of California, San Francisco, CA 94158, USA; California Institute for Quantitative Biomedical Research, University of California, San Francisco, CA 94158, USA. Electronic address: matt.jacobson@ucsf.edu.
² Department of Pharmaceutical Chemistry, School of Pharmacy, University of California, San Francisco, CA 94158, USA; California Institute for Quantitative Biomedical Research, University of California, San Francisco, CA 94158, USA.

PMID: 24998033
PMCID: PMC4117707
DOI: 10.1016/j.tibs.2014.05.006

Abstract

The rapid growth of the number of protein sequences that can be inferred from sequenced genomes presents challenges for function assignment, because only a small fraction (currently <1%) has been experimentally characterized. Bioinformatics tools are commonly used to predict functions of uncharacterized proteins. Recently, there has been significant progress in using protein structures as an additional source of information to infer aspects of enzyme function, which is the focus of this review. Successful application of these approaches has led to the identification of novel metabolites, enzyme activities, and biochemical pathways. We discuss opportunities to elucidate systematically protein domains of unknown function, orphan enzyme activities, dead-end metabolites, and pathways in secondary metabolism.

Keywords: docking; enzyme function prediction; homology modeling; metabolic pathways; protein structures.

PubMed Disclaimer

Figures

**Figure 1**
Structure based virtual metabolite docking protocol for enzyme activity prediction. When no structure has been experimentally determined for a protein sequence, a model can be built using a variety of comparative modeling methods, but only when the structure of a homologous protein is available that has ~30% of greater sequence identity to the protein of interest. Whether using a structure of a model, it is critical that active site metal ions and cofactors are present, and that catalytic residues are positioned appropriate for catalysis. Virtual metabolites libraries can be constructed and "docked" against the putative active sites of structures or models using computational tools more commonly employed in structure-based drug design (e.g., Glide, DOCK). The docking scoring functions can be used to rank the ligands according to their estimated relative binding affinities. Top scoring metabolites are typically inspected for plausibility (Is the predicted binding mode compatible with catalysis? Is the metabolite likely to be present in the relevant organism?), and then selected for experimental testing (in vitro enzymology). Protocols similar to that shown here have been used in retrospective and prospective studies [22-25, 27-33, 36, 39].

**Figure 2**
Predicted binding poses are in good agreement with subsequently determined experimental structures. Predicted ligand binding mode (cyan) superimposed with the X-ray crystal structure (gold) of: (a) S-adenosylhomocysteine deaminase (PDB: 2PLM); (b) N-succinyl-L-Arg racemase (PDB: 2P8C); (c) D-Ala-D-Ala epimerase (PDB: 3Q4D), and (d) a polyprenyl synthase (PDB: 4FP4). In (b), (c), and (d), the docking predictions were made using homology models based on crystal structures with 35%, 39%, and 29% sequence identity, respectively.

**Figure 3**
Structure-guided discovery of new enzymes in a novel hydroxyproline betaine metabolism pathway. Panel (a) shows the name, TrEMBL annotation, and most similar homolog in the PDB for each protein in the pathway. The automated TrEMBL annotations are incorrect or imprecise for all proteins in the pathway. However, there is rich structural information that can be used for modeling and docking, as shown in the closest PDB homolog column. The pathway is shown in (b). Panels c-e show the binding site and/or active site of the three proteins (HpbD, HpbJ and HpbR, shown in bold in (a)) in the pathway, respectively, along with the docking-predicted binding mode for the ligand trans-4-hydroxy-L-proline betaine (ball-and-stick, green color). Both HpbJ and HpbR have a predicted cation-π cage, known for binding quaternary amines. In HpbD, two catalytic residues (Lys163 and Lys265) replace aromatic residues, leaving Trp320 as the key aromatic residue forming a cation-π interaction with the substrate.

**Figure 4**
The biosynthesis of cholesterol: a paradigmatic isoprenoid pathway. Crystal structures of key enzymes in the pathway have been solved, including farnesyl pyrophosphate synthase (gold; PDB: 1RQI), squalene synthase (light blue; PDB: 3WEG), and oxidosqualene-lanosterol cyclase (magenta; PDB 1W6K). These crystal structures provide opportunities to predict functions of related enzymes of the isoprenoid synthase superfamily. However, function prediction for the terpenoid synthases (also called terpene cyclases) is extremely challenging due to the huge product chemical space created by carbocation rearrangements.

See this image and copyright information in PMC

Cited by

The enzymatic nature of an anonymous protein sequence cannot reliably be inferred from superfamily level structural information alone.
Roche DB, Brüls T. Roche DB, et al. Protein Sci. 2015 May;24(5):643-50. doi: 10.1002/pro.2635. Epub 2015 Jan 28. Protein Sci. 2015. PMID: 25559918 Free PMC article.
QM/MM free energy simulations: recent progress and challenges.
Lu X, Fang D, Ito S, Okamoto Y, Ovchinnikov V, Cui Q. Lu X, et al. Mol Simul. 2016;42(13):1056-1078. doi: 10.1080/08927022.2015.1132317. Epub 2016 Jul 5. Mol Simul. 2016. PMID: 27563170 Free PMC article.
Crystal structure of SgcJ, an NTF2-like superfamily protein involved in biosynthesis of the nine-membered enediyne antitumor antibiotic C-1027.
Huang T, Chang CY, Lohman JR, Rudolf JD, Kim Y, Chang C, Yang D, Ma M, Yan X, Crnovcic I, Bigelow L, Clancy S, Bingman CA, Yennamalli RM, Babnigg G, Joachimiak A, Phillips GN, Shen B. Huang T, et al. J Antibiot (Tokyo). 2016 Oct;69(10):731-740. doi: 10.1038/ja.2016.88. Epub 2016 Jul 13. J Antibiot (Tokyo). 2016. PMID: 27406907 Free PMC article.
Chemometric Models of Differential Amino Acids at the Na_vα and Na_vβ Interface of Mammalian Sodium Channel Isoforms.
Villa-Diaz F, Lopez-Nunez S, Ruiz-Castelan JE, Salinas-Stefanon EM, Scior T. Villa-Diaz F, et al. Molecules. 2020 Aug 3;25(15):3551. doi: 10.3390/molecules25153551. Molecules. 2020. PMID: 32756517 Free PMC article.
PCPD: Plant cytochrome P450 database and web-based tools for structural construction and ligand docking.
Wang H, Wang Q, Liu Y, Liao X, Chu H, Chang H, Cao Y, Li Z, Zhang T, Cheng J, Jiang H. Wang H, et al. Synth Syst Biotechnol. 2021 Apr 24;6(2):102-109. doi: 10.1016/j.synbio.2021.04.004. eCollection 2021 Jun. Synth Syst Biotechnol. 2021. PMID: 33997360 Free PMC article.

See all "Cited by" articles

References

1. UniProtKB/Swiss-Prot protein knowledgebase release 2014_01 statistics. [Online]. Available: http://web.expasy.org/docs/relnotes/relstat.html.
1. UniProtKB/TrEMBL protein database release 2014_01 statistics. [Online]. Available: http://www.ebi.ac.uk/uniprot/TrEMBLstats.
1. Friedberg I. Automated protein function prediction - the genomic challenge. Briefings in Bioinformatics. 2006;7:225–242. - PubMed
1. Schnoes AM, et al. Annotation error in public databases: misannotation of molecular function in enzyme superfamilies. PLoS Comp. Biol. 2009;5:e1000605. - PMC - PubMed
1. Seffernick JL, et al. Melamine deaminase and atrazine chlorohydrolase: 98 percent identical but functionally different. J. Bacteriol. 2001;183:2405–2410. - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

U54 GM093342/GM/NIGMS NIH HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Leveraging structure for enzyme function prediction: methods, opportunities, and challenges

Affiliations

Leveraging structure for enzyme function prediction: methods, opportunities, and challenges

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources