Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Apr 21:9:e11330.
doi: 10.7717/peerj.11330. eCollection 2021.

Fast computational mutation-response scanning of proteins

Affiliations

Fast computational mutation-response scanning of proteins

Julian Echave. PeerJ. .

Abstract

Studying the effect of perturbations on protein structure is a basic approach in protein research. Important problems, such as predicting pathological mutations and understanding patterns of structural evolution, have been addressed by computational simulations that model mutations using forces and predict the resulting deformations. In single mutation-response scanning simulations, a sensitivity matrix is obtained by averaging deformations over point mutations. In double mutation-response scanning simulations, a compensation matrix is obtained by minimizing deformations over pairs of mutations. These very useful simulation-based methods may be too slow to deal with large proteins, protein complexes, or large protein databases. To address this issue, I derived analytical closed formulas to calculate the sensitivity and compensation matrices directly, without simulations. Here, I present these derivations and show that the resulting analytical methods are much faster than their simulation counterparts.

Keywords: Compensatory mutations; Mutational response; Protein.

PubMed Disclaimer

Conflict of interest statement

The author declares that he has no competing interests.

Figures

Figure 1
Figure 1. Comparison between sMRS and aMRS sensitivity matrices. Results shown for Phospholipase A2 (d1jiaa).
The sensitivity matrix S has elements Sij that measure the structural shift of site i averaged over mutations at site j. sMRS is a simulation-based Mutation Response Scanning method that calculates S by averaging over simulated point mutations. aMRS is an analytical method that calculates S using a closed formula. (A) sMRS response matrix obtained by averaging over 200 mutations (simulation) compared with the aMRS matrix (analytical). (B) Scatterplot of the sMRS vs. aMRS matrix elements of A. (C) Convergence of sMRS with increasing number of mutations per site. In C the d1jiaa case is shown with black lines and points, and the other 9 proteins studied are shown with grey lines. Matrix elements Si j are normalised so that their average is 1. Logarithmic scale is used in A and B and R is the Pearson correlation coefficient between the log-transformed sMRS and aMRS matrices.
Figure 2
Figure 2. Comparison of sMRS and aMRS marginal profiles. Results shown for Phospholipase A2 (d1jiaa).
The influence profile is the average of the sensitivity matrix over rows; element Sj measures the average influence of mutations at site j. The sensitivity profile is the average of the response matrix over columns; element Si measures the average sensitivity of site i. (A) Sj profiles obtained with sMRS using 200 mutations per site (simulation) and aMRS (analytical); (B) scatter plot of the sMRS vs. aMRS Sj values of A; (C) convergence of the sMRS Sj profile towards the aMRS profile. (D) Si profiles obtained with sMRS using 200 mutations per site (simulation) and aMRS (analytical); (E) scatter plot of the sMRS vs. aMRS Si values of D; (F) convergence of the sMRS Si profiles towards the aMRS profile. In C and F, the d1jiaa case is shown with black lines and points, and the other 9 proteins studied are shown using grey lines. Profiles were calculated using the normalised matrix (matrix average is 1). Profile elements are shown in logarithmic scale and R is the Pearson correlation coefficient between log-transformed sMRS and aMRS profiles.
Figure 3
Figure 3. The analytical mutation-response scanning method (aMRS) is much faster than the simulation method (sMRS).
(A) CPU time vs. protein size the for sMRS with 200 mutations per site (simulation) and for aMRS (analytical). Time is shown in logarithmic scale. From the slope of the linear fits it follows that both times scale with N1.5 (N is the number of sites, each point is one protein). (B) The CPU time of the simulation method increases linearly with the CPU time of the analytical method, with a speedup of 126: tsMRS = 126×taMRS. (C) The speedup, tsMRS/taMRS obtained as shown in B, increases linearly with the number of mutations per site. Calculations were performed on the proteins of Table 1 using the methods implemented in R, with base LAPACK and the optimised AtlasBLAS libraries for matrix operations, on an early-2018 MacBook Pro notebook (processor i7-8850H).
Figure 4
Figure 4. Comparison of sDMRS and aDMRS compensation matrices. Results shown for Phospholipase A2 (d1jiaa).
The compensation matrix D has elements Dij that measure the maximum compensation of the structural deformation due to a mutation at site i afforded by a second mutation at j. sDMRS is a simulation-based Double Mutation Response Scanning method that calculates D by maximizing the structural compensation over pairs of simulated mutations. aDMRS is an analytical method that calculates D using a closed formula. (A) sDMRS compensation matrix obtained using 200 mutations per site (simulation) compared with the aDMRS matrix (analytical). (B) Scatterplot of the sDMRS vs. aDMRS matrix elements of A. (C) Convergence of the sDMRS matrix towards the aDMRS matrix with increasing number of mutations per site. In C the d1jiaa case is shown with black lines and points, and the other 9 proteins studied are shown with grey lines. Dij are normalised so that their average is 1, logarithmic scales are used in A and B, and R is Pearson’s correlation coefficient between log-transformed sDMRS and aDMRS matrix elements.
Figure 5
Figure 5. Comparison of sDMRS and aDMRS marginal profiles. Results shown for Phospholipase A2 (d1jiaa). Two marginal profiles are considered.
The Dj profile is the average of the compensation matrix over rows; element Dj measures the ability of j to compensate mutations at other sites. The Di profile is the average of the compensation matrix over columns; element Di measures the degree to which a mutation at i can be compensated by mutations elsewhere. (A) sDMRS Dj profile obtained using 200 mutations per site (simulation) and aDMRS Dj profile (analytical); (B) scatter plot of the sDMRS vs. aDMRS Dj values of A; (C) convergence of the sDMRS Dj profile towards the aDMRS profile. (D) sDMRS Di profile obtained using 200 mutations per site (simulation) and aDMRS Di profile (analytical); (E) scatter plot of the sDMRS vs. aDMRS Di values of D; (F) convergence of the sDMRS Di profile towards the aDMRS profile. In C and F, the d1jiaa case is shown with black lines and points, and the other 9 proteins studied are shown with grey lines. Profiles were calculated with normalised matrices (matrix average is 1), they are in logarithmic scale, and R is the Pearson correlation coefficient between the log-transformed sDMRS and aDMRS profiles.
Figure 6
Figure 6. The analytical double mutation-response scanning method (aDMRS) is much faster than the simulation method (aDMRS).
(A) CPU time vs. protein size for sDMRS with 200 mutations per site (simulation) and for aDMRS (analytical). Time is shown in logarithmic scale. From the slope of the linear fits it follows that both CPU times scale with N3 (N is the number of sites, each point is one protein). (B) The CPU time of the simulation method increases linearly with the CPU time of the analytical method, with a speedup of 137: tsDMRS = 137×taDMRS. (C) The speedup, tsDMRS/taDMRS, increases non-linearly with the number of mutations per site M, tending towards O(M2) for large M. Calculations were performed on the proteins of Table 1 using the methods implemented in R, with base LAPACK and the optimised AtlasBLAS libraries for matrix operations, on an early-2018 MacBook Pro notebook (processor i7-8850H).

Similar articles

References

    1. Alfayate A, Caceres CR, Dos Santos HGH, Bastolla U. Predicted dynamical couplings of protein residues characterize catalysis, transport and allostery. Bioinformatics. 2019;35(23):4971–4978. doi: 10.1093/bioinformatics/btz301. - DOI - PubMed
    1. Atilgan AR, Durell SR, Jernigan RL, Demirel MC, Keskin O, Bahar I. Anisotropy of fluctuation dynamics of proteins with an elastic network model. Biophysical Journal. 2001;80(1):505–515. doi: 10.1016/S0006-3495(01)76033-X. - DOI - PMC - PubMed
    1. Atilgan C, Atilgan AAR. Perturbation-response scanning reveals ligand entry-exit mechanisms of ferric binding protein. PLOS Computational Biology. 2009;5(10):e1000544. doi: 10.1371/journal.pcbi.1000544. - DOI - PMC - PubMed
    1. Atilgan C, Gerek ZN, Ozkan SB, Atilgan AR. Manipulation of conformational change in proteins by single-residue perturbations. Biophysical Journal. 2010;99(3):933–943. doi: 10.1016/j.bpj.2010.05.020. - DOI - PMC - PubMed
    1. Echave J. Evolutionary divergence of protein structure: the linearly forced elastic network model. Chemical Physics Letters. 2008;457(4–6):413–416. doi: 10.1016/j.cplett.2008.04.042. - DOI

LinkOut - more resources