. 2009 Dec;5(12):e1000584.

doi: 10.1371/journal.pcbi.1000584. Epub 2009 Dec 4.

Defining an essence of structure determining residue contacts in proteins

R Sathyapriya¹, Jose M Duarte, Henning Stehr, Ioannis Filippis, Michael Lappe

Affiliations

PMID: 19997489
PMCID: PMC2778133
DOI: 10.1371/journal.pcbi.1000584

Defining an essence of structure determining residue contacts in proteins

R Sathyapriya et al. PLoS Comput Biol. 2009 Dec.

. 2009 Dec;5(12):e1000584.

doi: 10.1371/journal.pcbi.1000584. Epub 2009 Dec 4.

Authors

R Sathyapriya¹, Jose M Duarte, Henning Stehr, Ioannis Filippis, Michael Lappe

Affiliation

¹ Structural Genomics/Bioinformatics Group, Otto Warburg Laboratory, Max Planck Institute for Molecular Genetics, Berlin, Germany.

PMID: 19997489
PMCID: PMC2778133
DOI: 10.1371/journal.pcbi.1000584

Abstract

The network of native non-covalent residue contacts determines the three-dimensional structure of a protein. However, not all contacts are of equal structural significance, and little knowledge exists about a minimal, yet sufficient, subset required to define the global features of a protein. Characterisation of this "structural essence" has remained elusive so far: no algorithmic strategy has been devised to-date that could outperform a random selection in terms of 3D reconstruction accuracy (measured as the Ca RMSD). It is not only of theoretical interest (i.e., for design of advanced statistical potentials) to identify the number and nature of essential native contacts-such a subset of spatial constraints is very useful in a number of novel experimental methods (like EPR) which rely heavily on constraint-based protein modelling. To derive accurate three-dimensional models from distance constraints, we implemented a reconstruction pipeline using distance geometry. We selected a test-set of 12 protein structures from the four major SCOP fold classes and performed our reconstruction analysis. As a reference set, series of random subsets (ranging from 10% to 90% of native contacts) are generated for each protein, and the reconstruction accuracy is computed for each subset. We have developed a rational strategy, termed "cone-peeling" that combines sequence features and network descriptors to select minimal subsets that outperform the reference sets. We present, for the first time, a rational strategy to derive a structural essence of residue contacts and provide an estimate of the size of this minimal subset. Our algorithm computes sparse subsets capable of determining the tertiary structure at approximately 4.8 A Ca RMSD with as little as 8% of the native contacts (Ca-Ca and Cb-Cb). At the same time, a randomly chosen subset of native contacts needs about twice as many contacts to reach the same level of accuracy. This "structural essence" opens new avenues in the fields of structure prediction, empirical potentials and docking.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

**Figure 1. The concept of structural essence.**
The concept of a minimal set of contacts essential for the reconstruction of the three-dimensional structure is elucidated with an example of CheY (1e6k). A The native structure of 1e6k is shown in ribbon representation (pink). B The Ca contacts are visualized in a contact map. The inset highlights all the Ca contacts (red) on the cartoon representation. C A subset selected from the native contact map is highlighted (black). The inset shows the selected subset mapped onto the structure. D The structure reconstructed from the selected subset is shown in ribbon representation (blue). E The superposition of the native and the reconstructed structures. The reconstruction accuracy is measured as the Ca RMSD of the superposition of the native structure and the reconstructed model.

**Figure 2. Subsets from random selection.**
A Increasing fractions of contacts (from 10% to 100%) are selected at random and reconstructed. Two independent random selections are performed for every fraction and the average Ca RMSD is reported for every protein in a SCOP class. Each class consists of three structures. In each class ‘*’ denotes proteins that are thrice as large as the other two proteins. B The reconstruction accuracies of the random subsets are compared between our method and Chen and co-workers. Five proteins (1dd3, 1nxb, 1igd, 1bxy, 1d0d) are selected from the Chen dataset and the random subsets are generated with (i) our contact definitions Ca 9.0 Å, Cb 8.0 Å (red) (ii) contact definition from Chen et al (Cb 7.5 Å) (black). Subsets from (i) and (ii) are reconstructed with Tinker (iii) The reconstruction accuracy from Chen et al (blue).

**Figure 3. Sequence-range based subset selection.**
The reconstruction accuracy of the short-range (left) and the long-range subsets (right) are shown (blue). The entire short (SR) and long-range (LR) contacts subsets are used in reconstruction. The comparison is against a random subset of similar size (red). The class average is the average Ca RMSDs from the ensembles (1/4^th best models) of every protein. The sizes of the SR and the LR subsets vary slightly in each SCOP class; however the trend was the preserved for both the Ca and the Cb graphs. (The average sizes of Ca graphs:- All α: SR = 62.2%, LR = 37.8%; All β: SR = 51.1%, LR = 49.9%; α/β: SR = 55.5%, LR = 44.5%; α+β: SR = 51.1%, LR = 48.9%).

**Figure 4. Common Neighbourhood of an edge (Cn(E_ij)).**
A contact E_ij (red) between nodes i (pink) and j (green) is shown. Let (N_i) be the neighbours of the i and (N_j) be neighbours of the j (grey). The CNb of edge (E_ij) is defined as The nodes k₁, k₂ and k₃ (yellow) share edges with nodes i and j. The triangles k₁, k₂ and k₃ make with E_ij constitute the CNb triangles of E_ij.

**Figure 5. Deriving the structural essence from cone-peeling strategy.**
A The contact map visualization of the common neighbourhoods. The cone shaped landscape of the CNbs is resultant of low CNb edges occupying the base of the cone, while the high CNb edges occupying the summits. The colour-bar shows the range of the CNb sizes. B The cone-peeling strategy characterizes the structural essence better than random selection. The algorithm selects a subset of native contacts that have high CNb and are also in the long sequence-range and removes all the local contacts. It can be seen that in all the proteins, the subsets selected from cone-peeling (blue) reconstruct better than a similar sized random subset (red) achieving a PI>1 consistently in all the cases. For every protein, the ensemble average Ca RMSD is reported. The sizes of the final subsets and the PIs of the individual proteins are given in Table 1. C The essential contacts (blue) obtained from cone-peeling are highlighted in the native structure of 1e6k (red) using Pymol . With 4.3% of Ca-Ca and 9% of Cb-Cb contacts, the subsets achieve a PI of 1.74. D The overlay of the best reconstructed models onto native structure (1e6k). The models reconstructed from the essential subsets obtained from the cone-peeling algorithm are superposed to the native structure for comparison. The best models selected (in terms of Ca RMSD) are shown in ribbon representation (orange). The native structure is shown in cartoon (blue). The overlaid models show an average Ca RMSD of 4.5 Å to the native structure. In the reconstructed models, only with the essential subsets of contacts, the secondary structural regions are well distinguished from the inter-secondary structural regions.

**Figure 6. The cone-peeling algorithm.**

See this image and copyright information in PMC

Cited by

CONFOLD: Residue-residue contact-guided ab initio protein folding.
Adhikari B, Bhattacharya D, Cao R, Cheng J. Adhikari B, et al. Proteins. 2015 Aug;83(8):1436-49. doi: 10.1002/prot.24829. Epub 2015 Jun 6. Proteins. 2015. PMID: 25974172 Free PMC article.
Prediction of protein-binding areas by small-world residue networks and application to docking.
Pons C, Glaser F, Fernandez-Recio J. Pons C, et al. BMC Bioinformatics. 2011 Sep 26;12:378. doi: 10.1186/1471-2105-12-378. BMC Bioinformatics. 2011. PMID: 21943333 Free PMC article.
Improved protein structure reconstruction using secondary structures, contacts at higher distance thresholds, and non-contacts.
Adhikari B, Cheng J. Adhikari B, et al. BMC Bioinformatics. 2017 Aug 29;18(1):380. doi: 10.1186/s12859-017-1807-5. BMC Bioinformatics. 2017. PMID: 28851269 Free PMC article.
Assessing Predicted Contacts for Building Protein Three-Dimensional Models.
Adhikari B, Bhattacharya D, Cao R, Cheng J. Adhikari B, et al. Methods Mol Biol. 2017;1484:115-126. doi: 10.1007/978-1-4939-6406-2_9. Methods Mol Biol. 2017. PMID: 27787823 Free PMC article.
Probabilistic grammatical model for helix-helix contact site classification.
Dyrka W, Nebel JC, Kotulska M. Dyrka W, et al. Algorithms Mol Biol. 2013 Dec 18;8(1):31. doi: 10.1186/1748-7188-8-31. Algorithms Mol Biol. 2013. PMID: 24350601 Free PMC article.

See all "Cited by" articles

References

1. Vassura M, Margara L, Di Lena P, Medri F, Fariselli P, et al. FT-COMAR: fault tolerant three-dimensional structure reconstruction from protein contact maps. Bioinformatics. 2008;24:1313–1315. - PubMed
1. Vendruscolo M, Kussell E, Domany E. Recovery of protein structure from contact maps. Fold Des. 1997;2:295–306. - PubMed
1. Mouradov D, Craven A, Forwood JK, Flanagan JU, Garcia-Castellanos R, et al. Modelling the structure of latexin-carboxypeptidase A complex based on chemical cross-linking and molecular docking. Protein Eng Des Sel. 2006;19:9–16. - PubMed
1. Petrotchenko EV, Xiao K, Cable J, Chen Y, Dokholyan NV, et al. BiPS, a photo-cleavable, isotopically-coded, fluorescent crosslinker for structural proteomics. Mol Cell Proteomics 2008 - PubMed
1. Alexander N, Bortolus M, Al-Mestarihi A, McHaourab H, Meiler J. De novo high-resolution protein structure determination from sparse spin-labeling EPR data. Structure. 2008;16:181–195. - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Defining an essence of structure determining residue contacts in proteins

Affiliation

Defining an essence of structure determining residue contacts in proteins

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Research Materials

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

MeSH terms

Substances

Related information

LinkOut - more resources

Full Text Sources

Research Materials