Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2025 Oct 7:2025.10.07.680994.
doi: 10.1101/2025.10.07.680994.

Bayesian multi-state multi-condition modeling of a protein structure based on X-ray crystallography data

Affiliations

Bayesian multi-state multi-condition modeling of a protein structure based on X-ray crystallography data

Matthew Hancock et al. bioRxiv. .

Abstract

An atomic structure model of a protein can be computed from a diffraction pattern of its crystal. While most crystallographic studies produce a single set of atomic coordinates, the billions of protein molecules in a crystal sample many conformational modes during data collection. As a result, a 'multi-state' model that depicts these conformations could reproduce the X-ray data better than a single conformation, and thus likely be more accurate. Computing such a multi-state model is challenging due to a lower data-to-parameter ratio than that for single-state modeling. To address this challenge, additional information could be considered, such as X-ray datasets collected for the same system under distinct experimental conditions (eg, temperature, ligands, mutations, and pressure). Here, we develop, benchmark, and illustrate MultiXray: Bayesian multi-state multi-condition modeling for X-ray crystallography. The input information is several X-ray datasets collected under distinct conditions and a molecular mechanics force field. The model consists of an independent coordinate set for each of several states and the weight of each state under each condition. A Bayesian posterior model density quantifies the match of the model with all X-ray datasets and the force field. A sample of models is drawn from the posterior model density using biased molecular dynamics (MD) simulations. We benchmark MultiXray on simulated CypA X-ray data. Using a second X-ray dataset improves the R free from 0.105 to 0.089. We then demonstrate MultiXray on experimental temperature-dependent data for SARS-CoV-2 Mpro. Using multiple X-ray datasets improves R free of the PDB-deposited structure from 0.253 to 0.237. MultiXray is implemented in our open-source Integrative Modeling Platform (IMP) software, relying on integration with Phenix, thus making it easily applicable to many studies.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.. Four stages of modeling.
Modeling is framed as a search for models whose computed properties match input information. Here, the input information is the CHARMM22 force field parameters, multiple X-ray datasets collected for the same system under distinct experimental conditions (eg, temperature), and a prior model. The model representation includes variables specifying atomic coordinates for each conformational state, along with a matrix of weight variables for each state under each condition. All states and weights are scored collectively for each X-ray dataset. Each state is individually scored against the molecular mechanics force field. A sample is drawn from the posterior model density using biased molecular dynamics simulations. All states are initialized by a prior model and the force on the atoms is computed from the satisfaction of all X-ray datasets and potential energy.
Figure 2.
Figure 2.. Degenerate scattering based on a multi-state model.
The X-ray forward model will produce identical computed data, and therefore posterior model density, for the following 5 combinations of 2-state models and unit cells: (1) A single crystal with different unit cells containing atoms in either M1 or M2 with probability w1 and w2, respectively. (2) Two crystals, with the first crystal containing atoms in M1 and the second crystal containing atoms in M2, with the proportion of atoms in M1 and M2 corresponding to w1 and w2, respectively. (3) A single crystal containing identical partially occupied unit cells with atoms in M1 and M2 with occupancy w1 and w2, respectively. (4) The same as (1) but with the two state indices swapped. (5) A single crystal with different unit cells, each one of which contains a mixture of atoms in the two states in equal proportion. As the number of unit cells approaches infinity, the reciprocal lattice points simulated from a crystal in (2), (3), (4), and (5) converge to those for the crystal in (1). We simulate scattering for (1)-(5) using discrete Fourier Transform (DFT). As the number of unit cells in the crystal increases, the error between the reciprocal lattice points of (2)-(5) and (1) converges to 0. The error is defined as the Euclidean distance between the structure factor and the reference structure factor, normalized by the magnitude of the reference structure factor.
Figure 3.
Figure 3.. Native model for benchmarking.
A 2-state 2-condition native model based in part on a previously determined structure model of CypA (PDB: 3K0M). The native contains 2 conformational states shown in green and orange, respectively. The 2 states exhibit structural heterogeneity typical of an actual crystal: small backbone deviations and large side chain deviations. The population of each state under the 2 simulated conditions varies. State 1, representative of a low energy state, is the dominant state at 100 K while state 2, representative of a high energy state, is the dominant state at 300 K. The corresponding weight matrix is shown.
Figure 4.
Figure 4.. Benchmark of multi-state multi-condition modeling.
Top left panel shows the mean and standard deviation of the score (joint log-likelihood) of the best-scoring model from 1000 sub-samples consisting of an increasing number of MD trajectories. The score of the native is shown as a dashed line. Top right panel shows the mean and standard deviation of the accuracy of the best-scoring model for the same sub-samples. Bottom left panel shows the mean and standard deviation of the accuracy of the most accurate model for the same sub-samples. Bottom right panel shows the score (joint log-likelihood) and accuracy of 1000 decoy 2-state 2-condition models generated in 5000 short MD simulations, selected randomly to approximately evenly span an accuracy range from 0 to 1 Å. X indicates the score and accuracy of the native model.
Figure 5.
Figure 5.. Using data from multiple conditions improves recovery of native.
Top, the best-scoring 2-state 2-condition CypA model for condition 1, as identified by the best Rfree for dataset 1. The model is overlayed over state 1 and state 2 of the native (transparent). To illustrate the accuracy of the backbone and side chain model heterogeneity, the Glu134 atoms and the Cα atoms of Leu39 - Gly47 are shown. The 2-condition model correctly fits both the backbone and side chain heterogeneity of the native. Bottom, the best-scoring 1-state 1-condition model for condition 1, as identified by the best Rfree for dataset 1. The 1-condition model does not correctly fit the side chain and backbone heterogeneity of the native. Both states 1 and 2 of the 1-condition model are a closer fit to state 1 of the native than state 2 of the native.
Figure 6.
Figure 6.. 2-state multi-condition models of SARS-CoV-2 Mpro.
Left, the Rfree of the best-scoring 2-state multi-condition model vs the Rfree of the PDB-deposited model for each dataset (6 in total), colored by temperature. Right, the Rfree of the best-scoring 2-state 2-condition model vs the Rfree of the best-scoring 2-state 1-condition model, colored by temperature. In both cases, a point below the line indicates improvement for the corresponding dataset.

References

    1. Rejto P A and Freer S T. “Protein conformational substates from X-ray crystallography”. en. In: Prog. Biophys. Mol. Biol. 66.2 (1996), pp. 167–196. - PubMed
    1. Smith Colin A et al. “Population shuffling of protein conformations”. en. In: Angew. Chem. Int. Ed Engl. 54.1 (Jan. 2015), pp. 207–210. - PubMed
    1. Woldeyes Rahel A, Sivak David A, and Fraser James S. “E pluribus unum, no more: from one crystal, many conformations”. en. In: Curr. Opin. Struct. Biol. 28 (Oct. 2014), pp. 56–62. - PMC - PubMed
    1. Karplus M and Petsko G A. “Molecular dynamics simulations in biology”. en. In: Nature 347.6294 (Oct. 1990), pp. 631–639. - PubMed
    1. DePristo Mark A, de Bakker Paul I W, and Blundell Tom L. “Heterogeneity and inaccuracy in protein structures solved by X-ray crystallography”. en. In: Structure 12.5 (May 2004), pp. 831–838. - PubMed

Publication types