. 2024 Mar 27;25(3):bbae137.

doi: 10.1093/bib/bbae137.

Multilevel superposition for deciphering the conformational variability of protein ensembles

Takashi Amisaki¹

Affiliations

PMID: 38557679
PMCID: PMC10983786
DOI: 10.1093/bib/bbae137

Multilevel superposition for deciphering the conformational variability of protein ensembles

Takashi Amisaki. Brief Bioinform. 2024.

. 2024 Mar 27;25(3):bbae137.

doi: 10.1093/bib/bbae137.

Author

Takashi Amisaki¹

Affiliation

¹ Department of Biological Regulation, Faculty of Medicine, Tottori University, Yonago, Tottori 683-8503, Japan.

PMID: 38557679
PMCID: PMC10983786
DOI: 10.1093/bib/bbae137

Abstract

The dynamics and variability of protein conformations are directly linked to their functions. Many comparative studies of X-ray protein structures have been conducted to elucidate the relevant conformational changes, dynamics and heterogeneity. The rapid increase in the number of experimentally determined structures has made comparison an effective tool for investigating protein structures. For example, it is now possible to compare structural ensembles formed by enzyme species, variants or the type of ligands bound to them. In this study, the author developed a multilevel model for estimating two covariance matrices that represent inter- and intra-ensemble variability in the Cartesian coordinate space. Principal component analysis using the two estimated covariance matrices identified the inter-/intra-enzyme variabilities, which seemed to be important for the enzyme functions, with the illustrative examples of cytochrome P450 family 2 enzymes and class A $\beta$-lactamases. In P450, in which each enzyme has its own active site of a distinct size, an active-site motion shared universally between the enzymes was captured as the first principal mode of the intra-enzyme covariance matrix. In this case, the method was useful for understanding the conformational variability after adjusting for the differences between enzyme sizes. The developed method is advantageous in small ensemble-size problems and hence promising for use in comparative studies on experimentally determined structures where ensemble sizes are smaller than those generated, for example, by molecular dynamics simulations.

Keywords: EM algorithm; covariance matrix; principal component analysis; random effects model; structural superposition.

PubMed Disclaimer

Figures

**Figure 1**
The MAE of the estimates of standard deviations. The upper four panes are the plots of the MAE of the square root of the diagonal components of for 10 (A), 20 (B), 40 (C) and 80 (D). The lower panes are the respective plots for . The averages for true values for and were 0.52 and 0.51, respectively. Lower MAE indicates higher accuracy. The accuracy of the TS/OLS estimates was lower than those of the other two methods. The accuracy of the REM was slightly higher than the other two at low .

formula image — **Figure 1**
The MAE of the estimates of standard deviations. The upper four panes are the plots of the MAE of the square root of the diagonal components of for 10 (A), 20 (B), 40 (C) and 80 (D). The lower panes are the respective plots for . The averages for true values for and were 0.52 and 0.51, respectively. Lower MAE indicates higher accuracy. The accuracy of the TS/OLS estimates was lower than those of the other two methods. The accuracy of the REM was slightly higher than the other two at low .

**Figure 2**
The RSMIP values of the top two eigenvectors as a function of . The upper four panes are the plots for with 10 (A), 20 (B), 40 (C) and 80 (D). The lower panes are the respective plots for . At low , the accuracy of REM estimates was slightly higher than those of the other methods.

**Figure 3**
The plot of openness and coordinates of the CYP2 structures along the PC1 of the covariance matrices: (A) PSE/IWLS , (B) REM and (C) REM . Openness was inferred from descriptions in the literature. ND is the abbreviation for non-determined and indicates that informative descriptions were not found in the literature. The PC1 of REM seems to have a better correlation with openness. Notice the difference in the scale of the vertical axes.

**Figure 4**
Open and closed structures of CYP2 generated in accordance with the PC1 of the estimated covariance matrices. The traces of C are colored by the residue index, from red (C-terminal) to blue (N-terminal). The open structures (A) and (B) were obtained by adding the highest coordinates of the PC1 of SPE and REM , respectively, throughout the structures to the estimated mean structures. The closed structures (C) and (D) were obtained by similarly adding the lowest coordinates of the PC1 of SPE and REM , respectively. Structures (A) and (B) show the larger cleft produced by the upward shift of the F–G region. In addition, in structure (A), the upward shift of the B–C loop is remarkable, enlarging the active site cavity as shown by the elongated rod dist3. The three rods are drawn as defined in Table 1.

**Figure 5**
The average magnitudes of per-residue deviations of the CYP2 conformations along PC1. (A) SPE , (B) REM and (C) REM . Each deviation is the average over the 162 conformations. The residue numbers of 2C8 are shown. The capital letters and bars at the bottom indicate the major helices and their regions (adopted from Ref. 37).

**Figure 6**
Projections of the X-ray structures of the -lactamases onto the PC1–PC2 plane of the covariance matrices: (A) REM and (B) REM . (A) The structures were classified into loosely three clusters: non-ESBLs, half of the ESBLs, and carbapenemases and the other half of the ESBLs. The outliers, 6afmA and 6afnA, of PenL are the structures of a non-catalytic-region mutant C69Y [39]. (B) The outliers of SHV-1 and TEM-1 were neighbors of each other’s major clusters. The structures 1pzpA and 1pzoA were reported as core-disrupted structures [43]. All structures of the SHV-1-BLIP complex were located in the vicinity of the TEM-1 cluster. The other three chains (A–C) of 5eph were not included in the analysis because of the insufficient coordinates of C atoms.

**Figure 7**
The superimposed mean structures of the -lactamases. The cartoon-drawing is the mean structure of TEM-1. The other structures are shown in colored traces. The gray, blue and orange lines are the structures of ESBL, non-ESBL and carbapenemases, respectively. The location of – loop was noticeable. The locations of the carbapenemases and the half of the ESBLs were shifted to cover the – loop which faced the -loop.

**Figure 8**
Displacement of the C atoms of each structure of the -lactamases. The coordinates in the first three PCs of (A) SPE , (B) REM and (C) REM were transformed to the deviations in the Cartesian coordinates (Å) and plotted per residue. The peak occurred at the hinge-11 region (218–223) for the deviation along the PC1 of (c, left). -, –, and – (169–176, 238–243, 266–275, respectively) constituted triple peaks in the plot for the PC2 of (C, middle). The standalone peak in (C, right) was produced mainly by the outliers in BEL-1 (5ephD). The PC1s of and were almost identical as indicated by the dot product between the two as high as 0.961.

**Figure 9**
Superimposed mean group structures of calmodulin. (A) REM and (B) TS/OLS. The traces of the C atoms are shown. The yellow and blue lines represent the average structures of the NMR and X-ray ensembles, respectively. For example, the uppermost helix in (A) is the average structure of that of an X-ray ensemble. Calmodulin consists of N- and C-domains and a link connecting them. In the structures superimposed using the (heteroscedastic) REM method, compared to the C-terminal half, the N-terminal half exhibited a better fit.

See this image and copyright information in PMC

References

1. Boehr DD, Nussinov R, Wright PE. The role of dynamic conformational ensembles in biomolecular recognition. Nat Chem Biol 2009;5:789–96. - PMC - PubMed
1. Motlagh HN, Wrabl JO, Li J, Hilser VJ. The ensemble nature of allostery. Nature 2014;508:331–9. - PMC - PubMed
1. Kessel A, Ben-Tal N. Introduction to Proteins: Structure, Function, and Motion. Boca Raton: CRC Press, 2011. 10.1093/bib/bbad242, 10.1093/molbev/msab017. - DOI
1. Wei G, Xi W, Nussinov R, Ma B. Protein ensembles: how does nature harness thermodynamic fluctuations for life? The diverse functional roles of conformational ensembles. Cell Chem Rev 2016;116:6516–51. - PMC - PubMed
1. Campitelli P, Modi T, Kumar S, Ozkan SB. The role of conformational dynamics and Allostery in modulating protein evolution. Annu Rev Biophys 2020;49:267–88. - PubMed

MeSH terms

Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

JP15K00404/Japan Society for the Promotion of Science

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Multilevel superposition for deciphering the conformational variability of protein ensembles

Affiliation

Multilevel superposition for deciphering the conformational variability of protein ensembles

Author

Affiliation

Abstract

Figures

References

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources