Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Oct 6;4(10):3195-3216.
doi: 10.1016/j.matt.2021.09.004. Epub 2021 Sep 22.

CryoFold: determining protein structures and data-guided ensembles from cryo-EM density maps

Affiliations

CryoFold: determining protein structures and data-guided ensembles from cryo-EM density maps

Mrinal Shekhar et al. Matter. .

Abstract

Cryo-electron microscopy (EM) requires molecular modeling to refine structural details from data. Ensemble models arrive at low free-energy molecular structures, but are computationally expensive and limited to resolving only small proteins that cannot be resolved by cryo-EM. Here, we introduce CryoFold - a pipeline of molecular dynamics simulations that determines ensembles of protein structures directly from sequence by integrating density data of varying sparsity at 3-5 Å resolution with coarse-grained topological knowledge of the protein folds. We present six examples showing its broad applicability for folding proteins between 72 to 2000 residues, including large membrane and multi-domain systems, and results from two EMDB competitions. Driven by data from a single state, CryoFold discovers ensembles of common low-energy models together with rare low-probability structures that capture the equilibrium distribution of proteins constrained by the density maps. Many of these conformations, unseen by traditional methods, are experimentally validated and functionally relevant. We arrive at a set of best practices for data-guided protein folding that are controlled using a Python GUI.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:. An overview of the CryoFold protocol.
For a high-resolution density map (data-rich case), backbone tracing is performed using MAINMAST to determine Cα positions, and a random coil is fitted to these positions using targeted MD. This fitted protein model is subjected to the next MELD-ReMDFF cycles as a search model. For a low or medium resolution density map (data-poor case), a search model is constructed from primary sequence using MELD. This search model is fitted into the density map using ReMDFF. The ReMDFF output is fed back to MELD for the next iteration, and the cycle continues until convergence. The last iteration of the cycle yeilds a refined model and refined ensemble.
Figure 2:
Figure 2:. Ensemble models for TRPV1 and the refinement protocol for ubiquitin.
(A) Ensemble refinement with CryoFold showcased for the soluble domain of TRPV1. Several conformations from the TRPV1 ensemble are superimposed; color coding from blue (N-terminal) to red (C-terminal). In a MELD-only simulation, a soluble loop (indicated in red) artifactually interacted with the transmembrane domains. Following the data-guidance from ReMDFF, this loop interacted with the soluble domains and a more focused ensemble is derived that agrees with the density map. (B) Stages of the refinement protocol for a test case, ubiquitin. The initial model is an unfolded coil. MELD was used to generate 50 search models from just the amino acid sequence, and no usage of the density map data. Then, these models were rigid-fitted into the density map using Chimera, and ranked based on their global correlation coefficient. ReMDFF refined the best rigid-fitted model even further. The ReMDFF model with the highest correlation coefficient (CC) to the density map served as a template for the subsequent iteration with MELD. In two consecutive MELD-ReMDFF iterations the RMSD of the folded model relative to the crystal structure (1UBQ) attenuated from 25.04Å to 2.53Å. The RMSD for unlabeled Cα-Cα pairing, reflecting that fit of atoms to density maps do not depend on the labels of the residues, changes from 3.18 Å (step 1) → 1.99 Å (step 2) → 1.54 Å (step 3) → 1.28 Å (step 4). However, unlike all-atom RMSD, such estimates are less sensitive to topological correctness of the model as poor connectivity can still reflects in low deviations from the standard.
Figure 3:
Figure 3:. Hybrid structure determination of Flpp3.
(A) High-resolution density map at 1.8 Å resolution. An unfolded structure was used as the initial model. A SFX density map at 1.8 Å resolution was employed to generate the Cα position (green spheres) using MAINMAST, and the initial model was fitted into these positions by targeted MD. The resulting structure (green cartoon model) was then subjected to MELD-ReMDFF refinement. This procedure yielded a structure with RMSD of 1.56 Å relative to the native SFX structure (yellow). The global CC of this structure is 0.83 and Molprobity score is 0.93 with 94.34% Ramachandran favoured backbones and 98.78% favoured sidechains (Table S2). The Rosetta-EM model (cyan) has an RMSD of 1.28 Å with respect to the SFX structure. (B) Lower-resolution density map at 5 Å resolution. An initial Cα trace in the map was computed using MAINMAST. Subsequent MELD-ReMDFF refinement resulted in a structure (green cartoon model) with an RMSD of 2.29 Å from the SFX structure (yellow) (Table S3). The best Rosetta-EM model has (cyan) an RMSD of 2.35 Å to the SFX structure. Bar plots depict the evolution of RMSD of the CryoFold models with each subsequent MELD-ReMDFF refinement. The inset of the bar plot in panel B is an RMSD vs global CC scatter plot for the first and second cycle of MELD refinements shown in lime green and dark green, respectively.
Figure 4:
Figure 4:. Modeling of the soluble domain of TRPV1.
(A) TRPV1 structures deposited in 2016 (pdb 5IRZ in yellow) and in 2013 (pdb 3J5P in cyan in cartoon representation, showing the latter has a more resolved β-sheet while the former possess an additional extended loop. (B) The 5IRZ model was heated at 600 K using brute-force MD, while constraining the α helices. After 10 ns of simulation, this treatment resulted in a search model with the loop regions significantly deviated and the β sheets completely denatured. The search model was subjected to MELD-ReMDFF refinement. A single round of MELD regenerated most of the β-sheet from this random chain, however the 5- to 15-residue long interconnecting loops still occupied non-native positions. Subsequent ReMDFF refinement with the 5IRZ density resurrected the loop positions. One more round of the MELD and ReMDFF resulted in the further refinement of the model. The final refined model agrees well with 5IRZ. (C) Progress of the refinement in each step of CryoFold. MELD step 1 shows the β sheets modeled correctly, while the loops recovered in ReMDFF step 2, and refinement was complete by step 4. The approach resulted in structures with 93.75% Ramachandran favored backbones and 92.37% favored sidechains and the Molprobity score of 1.67 (Table S4). Similar to the ubiquitin example, the RMSD for unlabeled Cα-Cα pairing, changes from 2.25 Å (step 1) → 1.28 Å (step 2) → 1.15 Å (steps 3-4). (D) Analysis of the MELD ensembles from the first and second MELD-ReMDFF iterations. The scatter plot shows RMSD vs CC for each structure from both ensembles. The ensemble statistics significantly shifts towards models consistent with the density maps, and yet capturing deviations around the best-fitted model, concomitantly accounting for data uncertainty.
Figure 5:
Figure 5:. CryoFold samples several biologically relevant states of the soluble domain of mitochondrial F1 - F0 ATPsynthase.
We modeled mitochondrial F1 - F0 ATPsynthase starting from pdb 6RET (state I) and excluding the grey region embedded in the membrane from refinement. CryoFold samples different conformations through a hinge motion in the OSCP region (orange) connecting the arm (blue) with the rotary domains (cyan). Clustering and 2D-RMSD analysis shows Cryofold samples conformations of additional ATPsynthase states represented by pdb codes 6RDK, 6RDL (state IV). Ohter states represented by pdb codes 6RDQ, 6RDR (state II) and 6RDW, 6RDX (state III) are included in SI.
Figure 6:
Figure 6:. Modeling transmembrane Magnesium-channel CorA.
(A) The CryoFold protocol on CorA. A starts from an Cα trace based Cryo-EM density map using MAINMAST and refined through different cycles of MELD and ReMDFF produces a structure that agrees well with the reported native structure (yellow), featuring accurate beta structures. (B) CryoFold produces narrower, more constraint ensembles as we iterate through MELD/MDFF. (C) A scatter plot of RMSD vs CC derived from the MELD ensembles at every stage of three MELD-ReMDFF iterations. The end-model refined using ReMDFF of the third-stage MELD ensemble is 2.60 Å RMSD from the reported structure.

Similar articles

Cited by

References

    1. Burnley BT, Afonine PV, Adams PD & Gros P Modelling dynamics in protein crystal structures by ensemble refinement. eLife 1, e00311 (2012). URL 10.7554/eLife.00311. - DOI - PMC - PubMed
    1. Terwilliger TC, Adams PD, Afonine PV & Sobolev OV A fully automatic method yielding initial models from high-resolution cryo-electron microscopy maps. Nature methods 15, 905 (2018). - PMC - PubMed
    1. Rout MP & Sali A Principles for Integrative Structural Biology Studies. Cell 177, 1384–1403 (2019). - PMC - PubMed
    1. Ray W, Kudryashev Y-R, Lia M, Egelmana X, Basler EH, Yifan MC, David B & Frank D De novo protein structure determination from near-atomic-resolution cryo-em maps. Nature Methods 12, 335 (2015). - PMC - PubMed
    1. Zhou W, Fiorin G, Anselmi C, Karimi-Varzaneh HA, Poblete H, Forrest LR & Faraldo-Gómez JD Large-scale state-dependent membrane remodeling by a transporter protein. eLife 8, e50576 (2019). URL 10.7554/eLife.50576. - DOI - PMC - PubMed