Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 Jul 17:8:257.
doi: 10.1186/1471-2105-8-257.

Bayesian refinement of protein functional site matching

Affiliations

Bayesian refinement of protein functional site matching

Kanti V Mardia et al. BMC Bioinformatics. .

Abstract

Background: Matching functional sites is a key problem for the understanding of protein function and evolution. The commonly used graph theoretic approach, and other related approaches, require adjustment of a matching distance threshold a priori according to the noise in atomic positions. This is difficult to pre-determine when matching sites related by varying evolutionary distances and crystallographic precision. Furthermore, sometimes the graph method is unable to identify alternative but important solutions in the neighbourhood of the distance based solution because of strict distance constraints. We consider the Bayesian approach to improve graph based solutions. In principle this approach applies to other methods with strict distance matching constraints. The Bayesian method can flexibly incorporate all types of prior information on specific binding sites (e.g. amino acid types) in contrast to combinatorial formulations.

Results: We present a new meta-algorithm for matching protein functional sites (active sites and ligand binding sites) based on an initial graph matching followed by refinement using a Markov chain Monte Carlo (MCMC) procedure. This procedure is an innovative extension to our recent work. The method accounts for the 3-dimensional structure of the site as well as the physico-chemical properties of the constituent amino acids. The MCMC procedure can lead to a significant increase in the number of significant matches compared to the graph method as measured independently by rigorously derived p-values.

Conclusion: MCMC refinement step is able to significantly improve graph based matches. We apply the method to matching NAD(P)(H) binding sites within single Rossmann fold families, between different families in the same superfamily, and in different folds. Within families sites are often well conserved, but there are examples where significant shape based matches do not retain similar amino acid chemistry, indicating that even within families the same ligand may be bound using substantially different physico-chemistry. We also show that the procedure finds significant matches between binding sites for the same co-factor in different families and different folds.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Alcohol dehydrogenase NAD-binding site (1hdx_1) matching against SCOP alcohol dehydrogenase-like family (Case 1). a) Graph matching prior to MCMC refinement step showing results with/without amino acid property information. Each site in the family is represented by a circle (with) and cross (without) connected by a straight line to highlight the difference. b) MCMC refinement step of (a).
Figure 2
Figure 2
Effect of MCMC refinement on graph matches of 1hdx_1 (Alcohol dehydrogenase) against SCOP alcohol dehydrogenase-like family (Case 1) where corresponding amino acids are restricted to others in the same group. Each site in the family is represented by a circle (graph only) and cross (with MCMC refinement) connected by a straight line to highlight the difference.
Figure 3
Figure 3
Corresponding amino acids between the NAD-binding site of alcohol dehydrogenase (1hdx_1) and NADP-binding site of quinone oxidoreductase (1qor_0) before and after MCMC refinement with the glycine rich motif highlighted (see main text) (Case 1).
Figure 4
Figure 4
Corresponding amino acids between the NAD-binding site of alcohol dehydrogenase (1hdx_1) and NADP-binding site of hypothetical protein YhdH (1o8c_1) before and after MCMC refinement step with the glycine rich motif highlighted (see main text) (Case 1).
Figure 5
Figure 5
Effect of MCMC refinement on graph matches of 1a27_0 (17 – β hydroxysteroid dehydrogenase) against SCOP tyrosine dependent oxidoreductase family (Case 2) where corresponding amino acids are not restricted to others in the same group. Each site in the family is represented by a circle (graph only) and cross (with MCMC refinement) connected by a straight line to highlight the difference.
Figure 6
Figure 6
Superposition of matching amino acids (Case 3) between alcohol dehydrogenase (1hdx_1; blue) and glyceraldehyde-3-phosphate dehydrogenase (3dbv_3; red) after MCMC refinement (RMSD = 0.672; number of corresponding amino acids = 12; p-value = 3.68e-05). The matched dinucleotide binding motif is shown in ball-and-stick representation. Ligands are coloured in CPK colours.
Figure 7
Figure 7
Histograms and traces of parameters when matching 17 – β hydroxysteroid dehydrogenase and carbonyl reductase (1cyd_1).
Figure 8
Figure 8
Decision tree for refining the graph solution by the MCMC method. Boxes with curved corners show processes and their output while boxes with sharp corners are for branching conditions. The procedure starts with graph solution MG. The graph solution's RMSD and number of matches are denoted by RMSDG and LG respectively. MCMC is re-iterated until the MCMC solution: MB is better. The RMSD and number of matches for MB are denoted by RMSDB and LB respectively. MB and MG are compared using 1) RMSDs and the number of matches or 2) P-values for MG and MG, denoted by PG and PB respectively.

References

    1. Shindyalov IN, Bourne PE. Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. Protein Eng. 1998;11:739–47. doi: 10.1093/protein/11.9.739. - DOI - PubMed
    1. Holm L, Sander C. Protein structure comparison by alignment of distance matrices. J Mol Biol. 1993;233:123–38. doi: 10.1006/jmbi.1993.1489. - DOI - PubMed
    1. Artymiuk PJ, Poirrette AR, Grindley HM, Rice DW, Willett P. A graph-theoretic approach to the identification of three-dimensional patterns of amino acid side-chains in protein structures. J Mol Biol. 1994;243:327–44. doi: 10.1006/jmbi.1994.1657. - DOI - PubMed
    1. Binkowski TA, Adamian L, Liang J. Inferring functional relationships of proteins from local sequence and spatial surface patterns. J Mol Biol. 2003;332:505–26. doi: 10.1016/S0022-2836(03)00882-9. - DOI - PubMed
    1. Kinoshita K, Sadanami K, Kidera A, Go N. Structural motif of phosphate-binding site common to various protein superfamilies: all-against-all structural comparison of protein-mononucleotide complexes. Protein Eng. 1999;12:11–4. doi: 10.1093/protein/12.1.11. - DOI - PubMed

Publication types

LinkOut - more resources