Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jul:2019:10.1109/cibcb.2019.8791469.
doi: 10.1109/cibcb.2019.8791469. Epub 2019 Aug 8.

A Probabilistic Programming Approach to Protein Structure Superposition

Affiliations

A Probabilistic Programming Approach to Protein Structure Superposition

Lys Sanz Moreta et al. Proc IEEE Symp Comput Intell Bioinforma Comput Biol. 2019 Jul.

Abstract

Optimal superposition of protein structures or other biological molecules is crucial for understanding their structure, function, dynamics and evolution. Here, we investigate the use of probabilistic programming to superimpose protein structures guided by a Bayesian model. Our model THESEUS-PP is based on the THESEUS model, a probabilistic model of protein superposition based on rotation, translation and perturbation of an underlying, latent mean structure. The model was implemented in the probabilistic programming language Pyro. Unlike conventional methods that minimize the sum of the squared distances, THESEUS takes into account correlated atom positions and heteroscedasticity (ie. atom positions can feature different variances). THESEUS performs maximum likelihood estimation using iterative expectation-maximization. In contrast, THESEUS-PP allows automated maximum a-posteriori (MAP) estimation using suitable priors over rotation, translation, variances and latent mean structure. The results indicate that probabilistic programming is a powerful new paradigm for the formulation of Bayesian probabilistic models concerning biomolecular structure. Specifically, we envision the use of the THESEUS-PP model as a suitable error model or likelihood in Bayesian protein structure prediction using deep probabilistic programming.

Keywords: Bayesian modelling; deep probabilistic programming; protein structure prediction; protein superposition.

PubMed Disclaimer

Figures

Fig. 1:
Fig. 1:
The THESEUS-PP model as a Bayesian graphical model. M is the latent, mean structure, which is an N-by-3 coordinate matrix, where N is the number of atoms. t1 and t2 are the translations. q is a unit quaternion calculated from three random variables u0:2 sampled from the unit interval and R is the corresponding rotation matrix. U is the among-row variance matrix of a matrix-normal distribution; X1 and X2 are N-by-3 coordinate matrices representing the proteins to be superimposed. Circles denote random variables; squares denote deterministic transformations of random variables. Shaded circles denote observed variables. Capital and small letters represent matrices and vectors respectively.
Fig. 2:
Fig. 2:
Protein pairs obtained with conventional RMSD superimposition (left) and with THESEUS-PP (right).The protein in green is rotated (X2). The images are generated with PyMOL [13].
Fig. 2:
Fig. 2:
Protein pairs obtained with conventional RMSD superimposition (left) and with THESEUS-PP (right).The protein in green is rotated (X2). The images are generated with PyMOL [13].
Fig. 3:
Fig. 3:
Graphs showing the pairwise distances (in Å) between the Cα coordinates of the structure pairs. The blue and orange lines represent RMSD and THESEUS-PP superposition, respectively.
Fig. 3:
Fig. 3:
Graphs showing the pairwise distances (in Å) between the Cα coordinates of the structure pairs. The blue and orange lines represent RMSD and THESEUS-PP superposition, respectively.

References

    1. Kabsch W, “A discussion of the solution for the best rotation to relate two sets of vectors,” Acta Cryst. A, vol. 34, pp. 827–828, 1978.
    1. Horn B, “Closed-form solution of absolute orientation using unit quaternions,” J. Opt. Soc. Am. A, vol. 4, pp. 629–642, 1987.
    1. Coutsias E, Seok C, and Dill K, “Using quaternions to calculate rmsd,” J. Comp. Chem, vol. 25, pp. 1849–1857, 2004. - PubMed
    1. Theobald DL and Wuttke DS, “Empirical Bayes hierarchical models for regularizing maximum likelihood estimation in the matrix Gaussian Procrustes problem,” PNAS, vol. 103, pp. 18521–18527, 2006. - PMC - PubMed
    1. Theobald DL and Steindel PA, “Optimal simultaneous superpositioning of multiple structures with missing data,” Bioinformatics, vol. 28, pp. 1972–1979, 2012. - PMC - PubMed

LinkOut - more resources