Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2006 Nov 21;103(47):17747-52.
doi: 10.1073/pnas.0605580103. Epub 2006 Nov 9.

Understanding ensemble protein folding at atomic detail

Affiliations

Understanding ensemble protein folding at atomic detail

Isaac A Hubner et al. Proc Natl Acad Sci U S A. .

Abstract

It has long been known that a protein's amino acid sequence dictates its native structure. However, despite significant recent advances, an ensemble description of how a protein achieves its native conformation from random coil under physiologically relevant conditions remains incomplete. Here we present a detailed all-atom model with a transferable potential that is capable of ab initio folding of entire protein domains using only sequence information. The computational efficiency of this model allows us to perform thousands of microsecond-time scale-folding simulations of the engrailed homeodomain and to observe thousands of complete independent folding events. We apply a graph-theoretic analysis to this massive data set to elucidate which intermediates and intermediary states are common to many trajectories and thus important for the folding process. This method provides an atomically detailed and complete picture of a folding pathway at the ensemble level. The approach that we describe is quite general and could be used to study the folding of proteins on time scales orders of magnitude longer than currently possible.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1.
Fig. 1.
The concept of a structural graph and flux (F). Each node (colored oval) represents a single protein conformation, and the edges (solid lines) represent connections based on structural similarity. The different colors indicate conformations from independent simulations, and the colored arrows indicate a dynamic relation (increasing step number) within each run. Each cluster is composed of multiple nodes from a single trajectory and/or from multiple trajectories. All trajectories are observed to converge to the GC (rightmost cluster), which contains the native conformations and has a flux of one. The smaller cluster with a flux of one corresponds to an obligate intermediate.
Fig. 2.
Fig. 2.
The results of structural cluster analysis with different OP. Each bar represents a high flux cluster with start and end values equal to MFPT and MLET values. Colors indicate flux values of 1, 0.9, 0.8, 0.7, 0.6, and 0.5 for purple, blue, green, yellow, orange, and red, respectively (clusters with F < 0.5 were not plotted). The results from each individual OP's clustering are plotted against separate y axis, in each case representing the <Rg> of the specified cluster (to an aligned x axis representing the MC time step). Note that clusters identified by different OP may overlap in time. Although clusters identified by different OP may overlap in time, a single structure is never found in more than one cluster of any single graph. Also, a single trajectory may move between clusters; the average dwell time is often less than the difference between MLET and MFPT. Nevertheless, the order of events in folding (as represented by clusters under different OP) may be determined through the clusters' MFPT. The nature of these events is also describable through factors such as their MLET and the structural-energetic characteristics.
Fig. 3.
Fig. 3.
The ENH folding intermediate. (a and b) The mutational model of the ENH folding intermediate (a) (36) is indistinguishable from the simulated intermediate (b) identified by structural cluster analysis. (c) A representative superposition of one experimental and simulated model. As in experiment, the N terminus is largely helical but lacks long-range order. The aligned (ordered) regions in the mutational and computational models span residues 28–53, corresponding to the red and green colored regions. The peptide chain is colored from N (blue) to C (red) terminus.
Fig. 4.
Fig. 4.
ENH structure prediction. (Left) The superimposed native and top E-k prediction colored from N (blue) to C (red) termini (only backbone shown). (Upper Right) Predicted side-chain packing for core aromatic residues. (Lower Right) Predicted side-chain packing for surface salt bridges (native in black and predicted in CPK color).
Fig. 5.
Fig. 5.
Folding from a denatured (D) state, which rapidly undergoes nonspecific collapse (C). There are several C states, characterized by increasing compaction and helical content (see Fig. 9, which is published as supporting information on the PNAS web site, for a plot of the OPs as a function of time). After the protein becomes sufficiently helical, the chain extends through fluctuations to an expanded intermediate (I) state, which allows rearrangement of the helices and is followed by the TS. A final collapse to a near-native (NN) state ensues, which proceeds through specific side-chain packing and energetic relaxation to the native (N) state. C1, C2, C3, and I may undergo rapid conversion (as indicated by overlap in Fig. 3). The sequence of events in this representative trajectory is identical to the ordering of events in structural cluster analysis of the ensemble of folding trajectories.

References

    1. Mirny L, Shakhnovich E. Annu Rev Biophys Biomol Struct. 2001;30:361–396. - PubMed
    1. Daggett V, Fersht A. Nat Rev Mol Cell Biol. 2003;4:497–502. - PubMed
    1. Gnanakaran S, Nymeyer H, Portman J, Sanbonmatsu KY, Garcia AE. Curr Opin Struct Biol. 2003;13:168–174. - PubMed
    1. Shakhnovich EI. Chem Rev. 2006;106:1559–1588. - PMC - PubMed
    1. Thirumalai D, Hyeon C. Biochemistry. 2005;44:4957–4970. - PubMed

Publication types

Substances

LinkOut - more resources