Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Sep 27;5(3):202-211.e3.
doi: 10.1016/j.cels.2017.09.001.

Folding Membrane Proteins by Deep Transfer Learning

Affiliations

Folding Membrane Proteins by Deep Transfer Learning

Sheng Wang et al. Cell Syst. .

Abstract

Computational elucidation of membrane protein (MP) structures is challenging partially due to lack of sufficient solved structures for homology modeling. Here, we describe a high-throughput deep transfer learning method that first predicts MP contacts by learning from non-MPs and then predicts 3D structure models using the predicted contacts as distance restraints. Tested on 510 non-redundant MPs, our method has contact prediction accuracy at least 0.18 better than existing methods, predicts correct folds for 218 MPs, and generates 3D models with root-mean-square deviation (RMSD) less than 4 and 5 Å for 57 and 108 MPs, respectively. A rigorous blind test in the continuous automated model evaluation project shows that our method predicted high-resolution 3D models for two recent test MPs of 210 residues with RMSD ∼2 Å. We estimated that our method could predict correct folds for 1,345-1,871 reviewed human multi-pass MPs including a few hundred new folds, which shall facilitate the discovery of drugs targeting at MPs.

Keywords: co-evolution analysis; deep learning; deep transfer learning; homology modeling; membrane protein contact prediction; membrane protein folding; multiple sequence alignment.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Overview of our deep learning model for MP contact prediction where L is the sequence length of one MP under prediction.
Figure 2
Figure 2
Top L/5 long-range accuracy (A), medium-range accuracy (B), and TMscore of the best of top 5 3D models (C) generated by our three models Mixed (cyan), NonMP-only (purple), MP-only (green) and CCMpred (blue) and MetaPSICOV (red) with respect to ln(Meff). (D) Summary results of all tested methods in terms of modeling accuracy. Column ‘#<XÅ’ lists the number of MPs whose 3D models have RMSD≤X Å. Column ‘#TM>Y’ lists the number of MPs whose 3D models have TMscore≥ Y. RMSD¯(TMsco¯) hows the average TMSD (TMscore) of all 510 MPs. TBM(MP) and TBM(NonMP) stands for template-based modeling with membrane proteins as templates and without membrane proteins as templates, respectively.
Figure 3
Figure 3
Quality comparison of the best of top 5 contact-assisted models generated by our two methods, CCMpred and MetaPSICOV. (A) Mixed vs. CCMpred; (B) Mixed vs. MetaPSICOV; (C) NonMP vs. CCMpred; (D) NonMP vs. MetaPSICOV.
Figure 4
Figure 4
Case study of one CAMEO target 5h35E. (A) The long- and medium-range contact prediction accuracy of our methods, MetaPSICOV, CCMpred, and EVfold (web server). (B–D) The overlap between the native contact map and contact maps predicted by our method, CCMpred, MetaPSICOV, and EVfold. Top L predicted all-range contacts are displayed. A grey, red and green dot represents a native contact, a correct prediction and a wrong prediction, respectively. (E) The superimposition between our predicted model (in red) and the native structure (in green).
Figure 5
Figure 5
(A) TMscore with respect to ln(Neff), based upon the 354 multi-pass membrane proteins in PDB. (B) ln(Neff) distribution of the 354 multi-pass MPs in PDB. (C) ln(Neff) distribution of the 2215 reviewed human multi-pass MPs.

Similar articles

Cited by

References

    1. ADHIKARI B, BHATTACHARYA D, CAO R, CHENG J. CONFOLD: residue - residue contact - guided ab initio protein folding. Proteins: Structure, Function, and Bioinformatics. 2015;83:1436–1449. - PMC - PubMed
    1. ALTSCHUL SF, MADDEN TL, SCHÄFFER AA, ZHANG J, ZHANG Z, MILLER W, LIPMAN DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic acids research. 1997;25:3389–3402. - PMC - PubMed
    1. BETANCOURT MR, THIRUMALAI D. Pair potentials for protein folding: choice of reference states and sensitivity of predicted native states to variations in the interaction schemes. Protein Science. 1999;8:361–369. - PMC - PubMed
    1. BIASINI M, BIENERT S, WATERHOUSE A, ARNOLD K, STUDER G, SCHMIDT T, KIEFER F, CASSARINO TG, BERTONI M, BORDOLI L, SCHWEDE T. SWISS-MODEL: modelling protein tertiary and quaternary structure using evolutionary information. Nucleic Acids Research. 2014;42:W252–W258. - PMC - PubMed
    1. BRIINGER AT, ADAMS PD, CLORE GM, DELANO WL, GROS P, GROSSE-KUNSTLEVE RW, JIANG JS, KUSZEWSKI J, NILGES M, PANNU NS. Crystallography & NMR system: A new software suite for macromolecular structure determination. Acta Crystallogr D Biol Crystallogr. 1998;54:905–921. - PubMed

Publication types

Substances

LinkOut - more resources