Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Jan 1;5(1):47-58.
doi: 10.1021/ct800282a.

Extracting Kinetic and Stationary Distribution Information from Short MD Trajectories via a Collection of Surrogate Diffusion Models

Affiliations

Extracting Kinetic and Stationary Distribution Information from Short MD Trajectories via a Collection of Surrogate Diffusion Models

Christopher P Calderon et al. J Chem Theory Comput. .

Abstract

Low-dimensional stochastic models can summarize dynamical information and make long time predictions associated with observables of complex atomistic systems. Maximum likelihood based techniques for estimating low-dimensional surrogate diffusion models from relatively short time series are presented. It is found that a heterogeneous population of slowly evolving conformational degrees of freedom modulates the dynamics. This underlying heterogeneity results in a collection of estimated low-dimensional diffusion models. Numerical techniques for exploiting this finding to approximate skewed histograms associated with the simulation are presented. In addition, statistical tests are also used to assess the validity of the models and determine physically relevant sampling information, e.g. the maximum sampling frequency at which one can discretely sample from an atomistic time series and have a surrogate diffusion model pass goodness-of-fit tests. The information extracted from such analyses can possibly be used to assist umbrella sampling computations as well as help in approximating effective diffusion coefficients. The techniques are demonstrated on simulations of Adenylate Kinase.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The C parameter of the Ornstein-Uhlenbeck process was estimated using short time series from 81 different (independent) US windows. The values estimated are denoted by symbols and the purpose of the line connecting the points is only to guide the eye. Each parameter estimate came from a time series containing 350 uniformly spaced entries. Three different ds values were used. The corresponding time Δt between observations is reported in the legend.
Figure 2
Figure 2
Hypothesis test results. In each panel, the staircase plots correspond to the empirical distribution function (EDF) of the test statistics obtained from batches of 75 time series (each using different ds values, the corresponding time between observations, Δt, are reported in the legend) and the solid curve corresponds to the distribution of the null computed for a finite sample size of 350 which was the length of each time series analyzed in this plot. The shaded region is used to show the α = 0.10 critical value. The percent of models rejected at this level can be found by noting the point on the x-axis where a color change occurs (denote by xcrit) and then evaluating 1-EDF(xcrit). Panel (a): The Q-test statistic given in Ref. [42] was applied to determine the time needed to wait between observations before an overdamped diffusion model could be applied to simulation data. The surrogate model parameters were estimated for each path and then the Q-test statistic was computed using the data and the estimated model. Panel (b): The T3 test statistic computed using the same estimated parameters and data.
Figure 3
Figure 3
Autocorrelation (AC) measured from MD data taken from US point ΔDrmsd0=7.03. The thick line represents the mean AC function obtained using the full 525 time series and estimating the AC for each sample path and then averaging the results. The thin AC labeled as “Path i” are some representative ACs. The thick dotted horizontal lines correspond to the 95 % confidence intervals.
Figure 4
Figure 4
(a) Scatter plot of the estimated C of the Ornstein-Uhlenbeck (OU) Model vs C. The data consists of the estimated OU noise parameter (C) obtained using time series consisting of uniformly sampled observations spaced by Δt = 0.30. The noise parameter was estimated for 50 batches of short time series and all US simulations used simulations corresponding to the US constraint point ΔDrmsd0=1.38 plotted against the (temporal) average value of C for the corresponding ΔDrmsd time series used to estimate the OU parameters. The linear correlation (r) between the estimated C and C was found to be 0.34 and the associated p-value was 1.0 × 10−3. (b) representative time series of the “fast” ΔDrmsd coordinate and (c) “slow” C coordinate. The three color coded trajectories in (b) and (c) correspond to the three color coded symbols in (a).
Figure 5
Figure 5
The histogram obtained from running MD simulations using 7 different constraint points are reported. Each data point contains the results from 50 independent simulations run for 105/ps (again uniform sampling with Δt = 0.30/ps). The prediction of the simple OU model which accounts for the conformational heterogeneity (see text for details) is shown as a solid line. In most cases this crude approximation is accurate, the largest discrepancy here is in the left and rightmost distributions.
Figure 6
Figure 6
Stationary density estimate focusing on rightmost density shown in Fig 5 (corresponding to ΔDrmsd0=7.03). The result obtained using the Ornstein-Uhlenbeck surrogate (solid red line) was poor. A batch of over-damped models were estimated (from the same times series used to fit the Ornstein-Uhlenbeck models in Fig 5). The solid lines denotes the invariant density obtained by appealing to Equation 7 and the dotted lines represent the invariant density prediction obtained by using 〈θ〉. The collection of thin blue lines display some representative invariant density predictions (i.e. “piEQ” in Equation 7). The histogram of Φ coming from an ensemble of 75 genuine MD time series of length 525 ps is represented as the jagged line.

Similar articles

Cited by

References

    1. Bustamante C, Bryant Z, Smith S. Nature. 2003;421:423. - PubMed
    1. Carrion-Vazquez M, Oberhauser A, Fisher T, Marszalek P, Li H, Fernandez J. Prog. Biophys. Mol. Bio. 2000;74:63. - PubMed
    1. Stock G, Ghosh K, Dill K. J. Chem. Phys. 2008;128:194102. - PMC - PubMed
    1. Collin D, Ritort F, Jarzynski C, Smith S, Tinoco I, Jr., Bustamante C. Nature. 2005;437:231. - PMC - PubMed
    1. Min W, Gopich I, English B, Kou S, Xie X, Szabo A. J. Phys. Chem. B. 2006;110:20093. - PubMed

LinkOut - more resources