. 2012 Jun 26;109(26):10340-5.

doi: 10.1073/pnas.1207864109. Epub 2012 Jun 12.

Genomics-aided structure prediction

Joanna I Sułkowska¹, Faruck Morcos, Martin Weigt, Terence Hwa, José N Onuchic

Affiliations

PMID: 22691493
PMCID: PMC3387073
DOI: 10.1073/pnas.1207864109

Genomics-aided structure prediction

Joanna I Sułkowska et al. Proc Natl Acad Sci U S A. 2012.

. 2012 Jun 26;109(26):10340-5.

doi: 10.1073/pnas.1207864109. Epub 2012 Jun 12.

Authors

Joanna I Sułkowska¹, Faruck Morcos, Martin Weigt, Terence Hwa, José N Onuchic

Affiliation

¹ Center for Theoretical Biological Physics, University of California at San Diego, La Jolla, CA 92093-0374, USA.

PMID: 22691493
PMCID: PMC3387073
DOI: 10.1073/pnas.1207864109

Abstract

We introduce a theoretical framework that exploits the ever-increasing genomic sequence information for protein structure prediction. Structure-based models are modified to incorporate constraints by a large number of non-local contacts estimated from direct coupling analysis (DCA) of co-evolving genomic sequences. A simple hybrid method, called DCA-fold, integrating DCA contacts with an accurate knowledge of local information (e.g., the local secondary structure) is sufficient to fold proteins in the range of 1-3 Å resolution.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

**Fig. 1.**
The DCA-fold methodology: Domain family alignments are used as input for direct coupling analysis (DCA), which generates a large number of accurate contact predictions (13). The DCA contacts are used to drive folding simulations, based on a modified structure-based model (SBM). The Hamiltonian of the SBM contains an interresidue contact potential for local and non-local contacts. Local information is described by a torsional potential together with the local secondary structure, which may be derived from a variety of methods (*SI Appendix* and, for example, ref. 27).

**Fig. 2.**
Comparison of estimated contact maps with native maps for four exemplary proteins (*SI Appendix*, Fig. S1 for maps of all proteins studied). Lower triangular maps, below diagonal, represent DCA contact maps and upper triangular maps are native maps with cutoff value of 5 Å. The prediction results shown in Figs. 3 and 4 and Table 1 used as input a set of contacts estimated using DCA. DCA produces high-quality estimates of contact maps, both in terms of true positive predictions but also in terms of the sparsity of the predicted contacts. Other statistical methods, like mutual information, produce a relatively good number of true positive contacts, but they tend to cluster in specific regions that obscure the global structure of the native contact map (13).

**Fig. 4.**
Protein structures predicted using DCA-fold (Table 1 and *SI Appendix*, Table S1). Their predicted contact maps are shown in Fig. 2. Prediction accuracy for complete proteins is measured in RMSD and by the Q metric (Q_total), where the latter characterizes the difference between the predicted and target structures independently of alignment (*SI Appendix*, section 7). These structures are predicted based on DCA contact maps. The results in column 1 were obtained using native contact distances and local information. These structures are indistinguishable from the native ones (*SI Appendix*, Fig. S3). The results in column 2 were obtained using a statistical contact potential and native local information. Columns 3 and 4 show predictions where the local information used was also estimated, based on the type of secondary structure (SS) a residue belongs to. The native SS classification was used in column 3 and a simple SS estimator based on DCA output was used in column 4.

**Fig. 3.**
Predicted RMSDs for 15 proteins of different sizes. The symbols indicate the nature of the information on local and non-local residue interactions. The results shown here correspond to the RMSD for 80% of the residues in the protein in order to avoid the effect of outliers (Table 1 and *SI Appendix*, section 7). Non-local interactions are derived from DCA contacts or random maps (control predictions indicated by symbol ×). Local information is obtained from the native structure or is estimated based on the local secondary structure (SS) classification. The SS classification (α helix or β strand) is obtained via the native structure or is estimated from patterns of the DCA contact map (*SI Appendix*, section 4.3). Open symbols refer to proteins used to derive the statistical potentials, while filled symbols refer to proteins that were used to test this model. The lines are guides to trends by symbols of the same color.

See this image and copyright information in PMC

References

1. Zhang Y, Skolnick J. Automated structure prediction of weakly homologous proteins on a genomic scale. Proc Natl Acad Sci USA. 2004;101:7594–7599. - PMC - PubMed
1. Leaver-Fay A, et al. ROSETTA3: An object-oriented software suite for the simulation and design of macromolecules. Methods Enzymol. 2011;487:545–574. - PMC - PubMed
1. Fiser A, Sali A. Modeller: Generation and refinement of homology-based protein structure models. Methods Enzymol. 2003;374:461–491. - PubMed
1. Moult J, Fidelis K, Kryshtafovych J, Rost B, Tramontano A. CASP8 Proceedings. Proteins. 2009;77(Suppl 9):1–228. - PubMed
1. Lockless SW, Ranganathan R. Evolutionarily conserved pathways of energetic connectivity in protein families. Science. 1999;286:295–299. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Genomics-aided structure prediction

Affiliation

Genomics-aided structure prediction

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources