Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Jun 26;109(26):10340-5.
doi: 10.1073/pnas.1207864109. Epub 2012 Jun 12.

Genomics-aided structure prediction

Affiliations

Genomics-aided structure prediction

Joanna I Sułkowska et al. Proc Natl Acad Sci U S A. .

Abstract

We introduce a theoretical framework that exploits the ever-increasing genomic sequence information for protein structure prediction. Structure-based models are modified to incorporate constraints by a large number of non-local contacts estimated from direct coupling analysis (DCA) of co-evolving genomic sequences. A simple hybrid method, called DCA-fold, integrating DCA contacts with an accurate knowledge of local information (e.g., the local secondary structure) is sufficient to fold proteins in the range of 1-3 Å resolution.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1.
Fig. 1.
The DCA-fold methodology: Domain family alignments are used as input for direct coupling analysis (DCA), which generates a large number of accurate contact predictions (13). The DCA contacts are used to drive folding simulations, based on a modified structure-based model (SBM). The Hamiltonian of the SBM contains an interresidue contact potential for local and non-local contacts. Local information is described by a torsional potential together with the local secondary structure, which may be derived from a variety of methods (SI Appendix and, for example, ref. 27).
Fig. 2.
Fig. 2.
Comparison of estimated contact maps with native maps for four exemplary proteins (SI Appendix, Fig. S1 for maps of all proteins studied). Lower triangular maps, below diagonal, represent DCA contact maps and upper triangular maps are native maps with cutoff value of 5 Å. The prediction results shown in Figs. 3 and 4 and Table 1 used as input a set of contacts estimated using DCA. DCA produces high-quality estimates of contact maps, both in terms of true positive predictions but also in terms of the sparsity of the predicted contacts. Other statistical methods, like mutual information, produce a relatively good number of true positive contacts, but they tend to cluster in specific regions that obscure the global structure of the native contact map (13).
Fig. 4.
Fig. 4.
Protein structures predicted using DCA-fold (Table 1 and SI Appendix, Table S1). Their predicted contact maps are shown in Fig. 2. Prediction accuracy for complete proteins is measured in RMSD and by the Q metric (Qtotal), where the latter characterizes the difference between the predicted and target structures independently of alignment (SI Appendix, section 7). These structures are predicted based on DCA contact maps. The results in column 1 were obtained using native contact distances and local information. These structures are indistinguishable from the native ones (SI Appendix, Fig. S3). The results in column 2 were obtained using a statistical contact potential and native local information. Columns 3 and 4 show predictions where the local information used was also estimated, based on the type of secondary structure (SS) a residue belongs to. The native SS classification was used in column 3 and a simple SS estimator based on DCA output was used in column 4.
Fig. 3.
Fig. 3.
Predicted RMSDs for 15 proteins of different sizes. The symbols indicate the nature of the information on local and non-local residue interactions. The results shown here correspond to the RMSD for 80% of the residues in the protein in order to avoid the effect of outliers (Table 1 and SI Appendix, section 7). Non-local interactions are derived from DCA contacts or random maps (control predictions indicated by symbol ×). Local information is obtained from the native structure or is estimated based on the local secondary structure (SS) classification. The SS classification (α helix or β strand) is obtained via the native structure or is estimated from patterns of the DCA contact map (SI Appendix, section 4.3). Open symbols refer to proteins used to derive the statistical potentials, while filled symbols refer to proteins that were used to test this model. The lines are guides to trends by symbols of the same color.

References

    1. Zhang Y, Skolnick J. Automated structure prediction of weakly homologous proteins on a genomic scale. Proc Natl Acad Sci USA. 2004;101:7594–7599. - PMC - PubMed
    1. Leaver-Fay A, et al. ROSETTA3: An object-oriented software suite for the simulation and design of macromolecules. Methods Enzymol. 2011;487:545–574. - PMC - PubMed
    1. Fiser A, Sali A. Modeller: Generation and refinement of homology-based protein structure models. Methods Enzymol. 2003;374:461–491. - PubMed
    1. Moult J, Fidelis K, Kryshtafovych J, Rost B, Tramontano A. CASP8 Proceedings. Proteins. 2009;77(Suppl 9):1–228. - PubMed
    1. Lockless SW, Ranganathan R. Evolutionarily conserved pathways of energetic connectivity in protein families. Science. 1999;286:295–299. - PubMed

Publication types

LinkOut - more resources