Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 May 27:11:283.
doi: 10.1186/1471-2105-11-283.

Optimal contact definition for reconstruction of contact maps

Affiliations

Optimal contact definition for reconstruction of contact maps

Jose M Duarte et al. BMC Bioinformatics. .

Abstract

Background: Contact maps have been extensively used as a simplified representation of protein structures. They capture most important features of a protein's fold, being preferred by a number of researchers for the description and study of protein structures. Inspired by the model's simplicity many groups have dedicated a considerable amount of effort towards contact prediction as a proxy for protein structure prediction. However a contact map's biological interest is subject to the availability of reliable methods for the 3-dimensional reconstruction of the structure.

Results: We use an implementation of the well-known distance geometry protocol to build realistic protein 3-dimensional models from contact maps, performing an extensive exploration of many of the parameters involved in the reconstruction process. We try to address the questions: a) to what accuracy does a contact map represent its corresponding 3D structure, b) what is the best contact map representation with regard to reconstructability and c) what is the effect of partial or inaccurate contact information on the 3D structure recovery. Our results suggest that contact maps derived from the application of a distance cutoff of 9 to 11A around the Cbeta atoms constitute the most accurate representation of the 3D structure. The reconstruction process does not provide a single solution to the problem but rather an ensemble of conformations that are within 2A RMSD of the crystal structure and with lower values for the pairwise average ensemble RMSD. Interestingly it is still possible to recover a structure with partial contact information, although wrong contacts can lead to dramatic loss in reconstruction fidelity.

Conclusions: Thus contact maps represent a valid approximation to the structures with an accuracy comparable to that of experimental methods. The optimal contact definitions constitute key guidelines for methods based on contact maps such as structure prediction through contacts and structural alignments based on maximum contact map overlap.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Schematic representation of the optimization procedure. 1) the native structure is decomposed into contact maps based on different definitions, 2) the 3D structure is reconstructed from contact information only, obtaining an ensemble of conformations, 3) the accuracy is measured against the original structure. The protein shown is PDB structure 1bxyA. The ensemble corresponds to 6 reconstructions (ribbon representation) in different colours and also contains the native protein (cartoon representation) in blue.
Figure 2
Figure 2
Accuracy of reconstructions. Reconstruction Cα RMSD vs. distance cutoff for each of the contact definitions. Plotted are the mean accuracy values for the set of 60 proteins for Cα, Cβ and Cα + Cβ contact definitions. Horizontal lines mark the minimum RMSD for each of them. The error bars represent the standard deviation across the distribution of 60 proteins.
Figure 3
Figure 3
Number of contacts and reconstruction accuracy. a) RMSD values for the protein 1bkrA using Cα as contact definition, the size of the dots represent the total number of contacts in the contact map for a particular cutoff. The red curve is a linear fit to a polynomial. b) RMSD delta over delta of number of contacts against the cut-off for Cα contact definition for the average of the 60 proteins in the data set. The red curve is again a linear fit to a polynomial.
Figure 4
Figure 4
Variability for different SCOP classes. Reconstruction accuracy comparison for proteins in the four SCOP classes, using boxplots to depict the distributions of RMSD values. There are exactly 15 proteins per class from the set of 60 PDB representatives. a) For Cα b) for Cβ and c) Cα + Cβ, all three at 9Å cutoff.
Figure 5
Figure 5
Comparison to previous studies. Comparison of our reconstruction RMSD values (black) with those of Vassura et al. (green) and Vendruscolo et al (red). The set is the one used by Vendruscolo and subsequently by Vassura. Two proteins were eliminated from their set because of ambiguities with the data. The error bars are for the variability across different runs (not reported by Vassura).
Figure 6
Figure 6
Reconstruction for incomplete or noisy maps. Behaviour of the reconstruction algorithm with noise or incomplete data. a) random subsets are sampled for Cα and Cβ maps, b) random subsets are sampled for Cβ maps at different cut-offs (7, 9, 11 and 13, with different colours) and c) random contact noise is added to the map (Cα and Cβ maps). The 12 proteins subset (see Methods) was used for this analysis. For each of the levels of noise 10 random samples were taken and 30 models generated. The variability within the different proteins in the set is represented with the error bars.

References

    1. Phillips DC. The development of crystallographic enzymology. Biochem Soc Symp. 1970;11:11–28. - PubMed
    1. Nishikawa K, Ooi T, Isogai Y, Saito N. Tertiary Structure of Proteins. I. Representation and Computation of the Conformations. Journal of the Physical Society of Japan. 1972;11:1331–1337. doi: 10.1143/JPSJ.32.1331. - DOI
    1. Caprara A, Carr R, Istrail S, Lancia G, Walenz B. 1001 optimal PDB structure alignments: integer programming methods for finding the maximum contact map overlap. J Comput Biol. 2004;11:27–52. doi: 10.1089/106652704773416876. - DOI - PubMed
    1. Holm L, Sander C. Protein structure comparison by alignment of distance matrices. J Mol Biol. 1993;11:123–138. doi: 10.1006/jmbi.1993.1489. - DOI - PubMed
    1. Alexandrov N, Shindyalov I. PDP: protein domain parser. Bioinformatics. 2003;11(3):429–430. doi: 10.1093/bioinformatics/btg006. - DOI - PubMed

LinkOut - more resources