Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2026 Jan 13;22(1):78-94.
doi: 10.1021/acs.jctc.5c01579. Epub 2026 Jan 2.

Efficient Sampling of Short Protein Trajectories with Conditional Diffusion Models

Affiliations

Efficient Sampling of Short Protein Trajectories with Conditional Diffusion Models

Chuanye Xiong et al. J Chem Theory Comput. .

Abstract

Understanding how protein structures dictate their diverse biological functions remains one of the central and enduring challenges in structural biology. The development of AlphaFold and ESMAtlas marks a significant advance in protein science, enabling the reliable prediction of protein structure directly from amino acid sequence. This advance in structure prediction underscores the need for complementary methods that can explore conformational space and enable efficient sampling of dynamic trajectories. Here, we present TSS-Pro, a conditional generative diffusion framework that enables efficient sampling of protein conformational trajectory space. TSS-Pro takes the initial frame as conditional input and generates protein conformational trajectories. It supports two sampling strategies: (1) consecutive sampling, where each trajectory segment is generated step by step by conditioning on the final frame of the previously generated segment, enabling temporally coherent propagation of structural transitions; (2) parallel sampling, where multiple trajectory branches are independently generated from initial conditions to enhance conformational diversity. We validate TSS-Pro on three representative systems of increasing complexity: alanine dipeptide, ubiquitin, and Drosophila cryptochrome (dCRY). TSS-Pro reproduces the free energy landscape of alanine dipeptide. In the case of ubiquitin, consecutive sampling with TSS-Pro overcomes local minima and uncovers distinct conformational states of the C-terminal region. For the large protein dCRY, TSS-Pro achieves high efficiency through parallel trajectory sampling, enabling conformational and dynamic exploration typically accessible only through extensive simulations. TSS-Pro paves the way for high-throughput exploration of protein trajectories and conformational landscapes for large and complex systems.

PubMed Disclaimer

Conflict of interest statement

Complete contact information is available at: https://pubs.acs.org/10.1021/acs.jctc.5c01579

The authors declare no competing financial interest.

Figures

Figure 1.
Figure 1.
Architecture of TSS-Pro. Molecular dynamics (MD) trajectories were used as training data. During the backward diffusion process, the initial frame was incorporated as a conditional input to guide trajectory generation. The framework was evaluated on three systems: (1) alanine dipeptide; (2) ubiquitin; (3) Drosophila cryptochrome (dCRY). Two sampling strategies were implemented: consecutive sampling, which generates trajectories stepwise by refining the last frame of each segment and using it as the starting condition for the next segment, ensuring temporal continuity; and parallel sampling, which generates multiple trajectory segments independently from one or more reference frames, thereby increasing structural diversity and accelerating exploration of conformational space.
Figure 2.
Figure 2.
Autocorrelation function (ACF) and partial autocorrelation function (PACF) for alanine dipeptide, ubiquitin, and dCRY trajectory data. Temporal features analyzed include: (1) Root-mean-square deviation (RMSD, mint green), quantifying structural deviation relative to the initial frame; (2) Mean backbone dihedral angle Φ (torsion around peptide bond N–Cα, lavender); (3) Mean backbone dihedral angle ψ (torsion around peptide bond Cα–C, pink); (4) Mean dihedral angle ω (torsion around the peptide bond C–N, Beige); and (5) One-dimensional protein coordinate representations derived from the autoencoder (AE) model (sky blue). The ACF measures both direct and indirect correlations between a variable and its past values, providing an overall assessment of temporal dependence across lag times. In contrast, the PACF isolates the direct contribution of each lag after accounting for the effects of shorter lags.
Figure 3.
Figure 3.
Average RMSD values for generated trajectories of varying lengths for (a) alanine dipeptide, (b) ubiquitin, and (c) dCRY, computed from 20 generated segments per system.
Figure 4.
Figure 4.
(a) RMSD as a function of frame number for the generated trajectories with 5 different batches. Colored lines represent individual trials, highlighting run-to-run variability and the diversity of conformations generated by TSS-Pro. (b) Representative structure snapshots from the generated trajectories, illustrating conformational changes over time (top: ubiquitin; bottom: dCRY).
Figure 5.
Figure 5.
Potential of Mean Force (PMF) profiles of the ϕ-ψ dihedral angle distribution for alanine dipeptide: (a) 100 ns MD simulation, (b) TSS-Pro sampling through 5-frame model, (c) TSS-Pro sampling through 10-frame model, and (d) TSS-Pro sampling through 20-frame model.
Figure 6.
Figure 6.
Comparison of the conformational space distributions from MD simulations and TSS-Pro generated trajectories. Potential of Mean Force of (a) 100 ns MD used for training, (b) 100 ns independent MD, and (c) 100 ns ubiquitin TSS-Pro consecutive sampling.
Figure 7.
Figure 7.
(a) RMSD evolution for predicted trajectories and MD simulations over 100 ns, demonstrating consistency with MD dynamics. Close-up view for the jump of RMSD over (b) TSS-Pro-generated trajectory and (d) 100 ns training MD trajectory, and (c) Root-mean-square fluctuation (RMSF) profiles comparing TSS-Pro–generated trajectories with MD simulations, highlighting flexible loop regions and the C-terminal tail (CTT). The RGG (74–76) motif shows the highest fluctuations, consistent with its functional role.
Figure 8.
Figure 8.
MDSCAN-Based Clustering of Ubiquitin Conformations from TSS-Pro Consecutive Sampling. (a) Average pairwise RMSD values between clusters; (b) Representative centroid structure of each clusters.
Figure 9.
Figure 9.
Parallel sampling distribution based on the conditional frames over (a) Ubiquitin 5-frame and (b) Ubiquitin 10-frame.
Figure 10.
Figure 10.
Parallel sampling distribution based on the conditional frames for (a) dCRY 5-frame and (b) dCRY 10-frame methods.
Figure 11.
Figure 11.
Benchmark of TSS-Pro assessing the quality and efficiency of protein trajectory generation. (a) Atom clash rate of generated trajectories for ubiquitin and dCRY, where lower values indicate more physically realistic structures. (b) TM-Score comparison between generated trajectories and reference frames, with values closer to 1.0 indicating high fidelity to native conformations and lower values indicating greater conformational diversity. (c) Training time of the TSS-Pro for the three system. Runtime benchmark showing wall-clock time required for TSS-Pro (d) parallel sampling and (e) consecutive sampling compared with conventional MD simulations across alanine dipeptide, ubiquitin, and dCRY.
Figure 12.
Figure 12.
Comparison of the BERT, MLP, and UNet models in our cDDPM framework. (a) atomic clash rates calculated from 40 sampled structures for each model; (b) validation loss curves during the training process.

References

    1. Xia K; Fu Z; Hou L; Han J-DJ Impacts of Protein–Protein Interaction Domains on Organism and Network Complexity. Genome Res 2008, 18 (9), 1500–1508. - PMC - PubMed
    1. Walhout AJM; Vidal M Protein Interaction Maps for Model Organisms. Nat. Rev. Mol. Cell Biol 2001, 2 (1), 55–63. - PubMed
    1. Taylor GK; Stoddard BLS Functional and Evolutionary Relationships between Homing Endonucleases and Proteins from Their Host Organisms. Nucleic Acids Res 2012, 40 (12), 5189–5200. - PMC - PubMed
    1. Ivanova VP Fibronectins: Structural-Functional Relationships. J. Evol. Biochem. Physiol 2017, 53 (6), 450–464.
    1. Vasilchenko AS; Valyshev AV Pore-Forming Bacteriocins: Structural–Functional Relationships. Arch. Microbiol 2019, 201 (2), 147–154. - PubMed

LinkOut - more resources