Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Aug 20;116(34):16856-16865.
doi: 10.1073/pnas.1821309116. Epub 2019 Aug 9.

Distance-based protein folding powered by deep learning

Affiliations

Distance-based protein folding powered by deep learning

Jinbo Xu. Proc Natl Acad Sci U S A. .

Abstract

Direct coupling analysis (DCA) for protein folding has made very good progress, but it is not effective for proteins that lack many sequence homologs, even coupled with time-consuming conformation sampling with fragments. We show that we can accurately predict interresidue distance distribution of a protein by deep learning, even for proteins with ∼60 sequence homologs. Using only the geometric constraints given by the resulting distance matrix we may construct 3D models without involving extensive conformation sampling. Our method successfully folded 21 of the 37 CASP12 hard targets with a median family size of 58 effective sequence homologs within 4 h on a Linux computer of 20 central processing units. In contrast, DCA-predicted contacts cannot be used to fold any of these hard targets in the absence of extensive conformation sampling, and the best CASP12 group folded only 11 of them by integrating DCA-predicted contacts into fragment-based conformation sampling. Rigorous experimental validation in CASP13 shows that our distance-based folding server successfully folded 17 of 32 hard targets (with a median family size of 36 sequence homologs) and obtained 70% precision on the top L/5 long-range predicted contacts. The latest experimental validation in CAMEO shows that our server predicted correct folds for 2 membrane proteins while all of the other servers failed. These results demonstrate that it is now feasible to predict correct fold for many more proteins lack of similar structures in the Protein Data Bank even on a personal computer.

Keywords: deep learning; direct coupling analysis; protein contact prediction; protein distance prediction; protein folding.

PubMed Disclaimer

Conflict of interest statement

The author declares no conflict of interest.

Figures

Fig. 1.
Fig. 1.
The overall deep network architecture for protein distance prediction.
Fig. 2.
Fig. 2.
Distance prediction and folding results on the 37 CASP12 FM and 41 CAMEO hard targets. (A) Quality of distance- vs. contact-based 3D models predicted by our method. (B) Distance-based 3D model quality vs. logarithm of Meff. (C) Cβ–Cβ distance prediction error vs. logarithm of Meff. (D) Distance-based 3D model quality vs. Cβ–Cβ distance prediction error. Here model quality or quality of a model denotes the quality of a predicted 3D model measured by TMscore.
Fig. 3.
Fig. 3.
Distance prediction and folding results of RaptorX-Contact on the 32 CASP13 FM targets. (A) Cβ–Cβ distance prediction error vs. logarithm of Meff. (B) Three-dimensional model quality vs. contact precision. (C) Three-dimensional model quality vs. Cβ–Cβ distance prediction error. (D) Three-dimensional model quality vs. logarithm of Meff.
Fig. 4.
Fig. 4.
Novelty of RaptorX-Contact 3D models. (A) RaptorX-Contact first model quality vs. target-training structure similarity. (B) Structure similarity between RaptorX-Contact models and training proteins. (C) First model quality of RaptorX-Contact vs. CNFpred. (D) First model quality of RaptorX-Contact vs. RaptorX-TBM.
Fig. 5.
Fig. 5.
Relationship between RaptorX-Contact model quality and distance violation.
Fig. 6.
Fig. 6.
Contacts predicted by CCMpred (upper right triangle) and RaptorX-Contact (lower left triangle) on T0950-D1, T0957s2-D1, T0975-D1, and T0980s1-D1. Native, correctly predicted, and incorrectly predicted contacts are displayed in gray, red, and green, respectively. Top n medium- and long-range predicted contacts are displayed where n is the number of native contacts.

References

    1. Marks D. S., et al. , Protein 3D structure computed from evolutionary sequence variation. PLoS One 6, e28766 (2011). - PMC - PubMed
    1. Marks D. S., Hopf T. A., Sander C., Protein structure prediction from sequence variation. Nat. Biotechnol. 30, 1072–1080 (2012). - PMC - PubMed
    1. Morcos F., et al. , Direct-coupling analysis of residue coevolution captures native contacts across many protein families. Proc. Natl. Acad. Sci. U.S.A. 108, E1293–E1301 (2011). - PMC - PubMed
    1. de Juan D., Pazos F., Valencia A., Emerging methods in protein co-evolution. Nat. Rev. Genet. 14, 249–261 (2013). - PubMed
    1. Jones D. T., Buchan D. W., Cozzetto D., Pontil M., PSICOV: Precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments. Bioinformatics 28, 184–190 (2012). - PubMed

Publication types

Substances

LinkOut - more resources