Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Jan 8;6(1):e1000636.
doi: 10.1371/journal.pcbi.1000636.

Combining structure and sequence information allows automated prediction of substrate specificities within enzyme families

Affiliations

Combining structure and sequence information allows automated prediction of substrate specificities within enzyme families

Marc Röttig et al. PLoS Comput Biol. .

Abstract

An important aspect of the functional annotation of enzymes is not only the type of reaction catalysed by an enzyme, but also the substrate specificity, which can vary widely within the same family. In many cases, prediction of family membership and even substrate specificity is possible from enzyme sequence alone, using a nearest neighbour classification rule. However, the combination of structural information and sequence information can improve the interpretability and accuracy of predictive models. The method presented here, Active Site Classification (ASC), automatically extracts the residues lining the active site from one representative three-dimensional structure and the corresponding residues from sequences of other members of the family. From a set of representatives with known substrate specificity, a Support Vector Machine (SVM) can then learn a model of substrate specificity. Applied to a sequence of unknown specificity, the SVM can then predict the most likely substrate. The models can also be analysed to reveal the underlying structural reasons determining substrate specificities and thus yield valuable insights into mechanisms of enzyme specificity. We illustrate the high prediction accuracy achieved on two benchmark data sets and the structural insights gained from ASC by a detailed analysis of the family of decarboxylating dehydrogenases. The ASC web service is available at http://asc.informatik.uni-tuebingen.de/.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Graphical overview of the ASC method.
(A) In the first step training sequences are aligned using 3DCoffee to get an MSA. (B) In a second step residues lining the active site are extracted from the template structure. (C) The third step maps the extracted residues along the MSA to get a signature of the active site for each sequence. (D) These signatures are then encoded into feature vectors using the three descriptors formula image. Alternatively, kernels may be used. (E) The final ASC model is trained using the generated feature vectors.
Figure 2
Figure 2. View on the superimposed active sites of IPMDH and ICDH.
The first chain of the homo-dimeric enzyme is represented by its solvent-excluded surface. The second chain is depicted in a backbone representation. The two substrates isocitrate (purple) and isopropylmalate (green) lie in the interface of the two chains. IPMDH sidechains are coloured green and sidechains from ICDH (PDB-Id: 1AI2, [40]) are coloured purple. This figure was created using BALLView .

References

    1. Valencia A. Automatic annotation of protein function. Curr Opin Struct Biol. 2005;15:267–274. - PubMed
    1. Chothia C, Lesk AM. The relation between the divergence of sequence and structure in proteins. EMBO J. 1986;5:823–826. - PMC - PubMed
    1. Altschul SF, Madden TL, Schaeffer AA, Zhang J, Zhang Z, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. - PMC - PubMed
    1. Durbin R, Eddy S, Krogh A, Mitchison G. Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge University Press; 1998.
    1. Finn RD, Mistry J, Schuster-Boeckler B, Griffiths-Jones S, Hollich V, et al. Pfam: clans, web tools and services. Nucleic Acids Res. 2006;34:D247–D251. - PMC - PubMed

Publication types