Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Jun;84(6):841-54.
doi: 10.1002/prot.25034. Epub 2016 Apr 9.

Dissecting the roles of local packing density and longer-range effects in protein sequence evolution

Affiliations

Dissecting the roles of local packing density and longer-range effects in protein sequence evolution

Amir Shahmoradi et al. Proteins. 2016 Jun.

Abstract

What are the structural determinants of protein sequence evolution? A number of site-specific structural characteristics have been proposed, most of which are broadly related to either the density of contacts or the solvent accessibility of individual residues. Most importantly, there has been disagreement in the literature over the relative importance of solvent accessibility and local packing density for explaining site-specific sequence variability in proteins. We show that this discussion has been confounded by the definition of local packing density. The most commonly used measures of local packing, such as contact number and the weighted contact number, represent the combined effects of local packing density and longer-range effects. As an alternative, we propose a truly local measure of packing density around a single residue, based on the Voronoi cell volume. We show that the Voronoi cell volume, when calculated relative to the geometric center of amino-acid side chains, behaves nearly identically to the relative solvent accessibility, and each individually can explain, on average, approximately 34% of the site-specific variation in evolutionary rate in a data set of 209 enzymes. An additional 10% of variation can be explained by nonlocal effects that are captured in the weighted contact number. Consequently, evolutionary variation at a site is determined by the combined effects of the immediate amino-acid neighbors of that site and effects mediated by more distant amino acids. We conclude that instead of contrasting solvent accessibility and local packing density, future research should emphasize on the relative importance of immediate contacts and longer-range effects on evolutionary variation. Proteins 2016; 84:841-854. © 2016 Wiley Periodicals, Inc.

Keywords: contact number; packing density; protein evolution; protein structure; solvent accessibility.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Schematic of our analysis approach. (A, B) Evolutionary rates at each site in a protein correlate with measures of packing density, such as WCN (A) or Voronoi cell volume (B). Results are shown here for all amino acids in one representative protein (PDB ID: 1ONR, chain A) (C) For all proteins in the data set, the Spearman’s correlation coefficients of ER–WCN and ER–Cell Volume relations are calculated and compared to each other. Each black point represents the two correlation values for one protein. The red dashed line represents the equality line for the absolute values of the correlation strengths. (D) Finally, we convert the set of correlation coefficients into distributions and compare their relative means. On average, WCN correlates more strongly with ER than cell volume does.
Figure 2
Figure 2
Absolute correlation of evolutionary rate with (A) Contact Number and (B) Weighted Contact Number for varying degrees of locality of these quantities. In (A), we vary the cutoff parameter r0 in Eq. 2 from 0 to 50Å. In (B), we vary the exponent α in Eq. 4 from −30 to 30. In each plot, the solid black line represents the mean correlation strength in the entire dataset of 209 proteins and the dashed black line indicates the median of the distribution. The green-shaded region together with the red-dashed lines represent the 25% and 75% quartiles of the correlation-strength distribution. Note that for the case of WCN with α > 0 the sign of the correlation strength ρ is the opposite of the sign of ρ with α < 0. In addition, ρ is undefined at α = 0 and not shown in this plot. The parameter values at which the correlation coefficient reaches the maximum over the entire dataset are given in Table 1.
Figure 3
Figure 3
Fraction of residues that are included in the calculation of WCNcutoff when its correlation with site-specific evolutionary rate is maximized. For the vast majority of proteins, the highest correlation strengths are obtained when over 90% of the residues in the protein are included in the calculation of WCNcutoff.
Figure 4
Figure 4
Example of a Voronoi tessellation in two dimensions. The red dots represent the seed points, and the black lines delineate the Voronoi cells. For protein structures, the tessellation is carried out in three dimensions.
Figure 5
Figure 5
Correlations and partial correlations of ER with various Voronoi cell properties. (A) Distributions of ER correlations with Voronoi cell properties sphericity, edge length, eccentricity, volume, and area. Note that all cell characteristics correlate positively with ER, except sphericity which correlates negatively with ER. We show here the correlations of ER with (−1)×cell sphericity so that the correlation distributions all appear on the same positive scale. The correlations for cell area and cell volume are not significantly different from each other, and are higher than the correlations for all other Voronoi cell properties (Table 2). (B) Distributions of partial correlations of ER with Voronoi cell properties, controlling for cell volume. All partial correlations are relatively small, with medians of approximately 0.1. Therefore, none of these cell properties provide much independent information about ER once cell volume is accounted for.
Figure 6
Figure 6
Correlations and partial correlations of ER with Voronoi cell volume, WCN, and RSA. (A) Distributions of ER correlations with Voronoi cell volume, WCN (using side-chain and Cα coordinates, denoted by SC and CA, respectively), and RSA. Note that the WCN measures correlate negatively with ER. We show here correlations of ER with (−1) ×WCN so that the correlation distributions all appear on the same positive scale. The correlations of ER with WCN (SC) are significantly higher than the correlations of ER with all other quantities shown (Table 4). (B) Distributions of partial correlations of ER with WCN and RSA, controlling for cell volume. The partial correlation of ER with –WCN are substantial, with median values of 0.32 (side-chain WCN) and 0.21 (Cα WCN). By contrast, the partial correlations of ER with RSA largely vanish, with a median value of 0.09.
Figure 7
Figure 7
Side-chain centers provide most informative reference points for both WCN and Voronoi tessellation. (A) Distribution of correlations of WCN with ER, for seven different coordinate sets according to which WCN was calculated: SC, AA, CB, CA, N, C, O. Each coordinate set represents a different way of identifying the reference location of each residue. For SC (Side Chain) and AA (entire Amino Acid), the reference point is given, respectively, by the geometric average coordinates of the Side Chain (SC) atoms and the entire Amino Acid (AA) atoms. The latter include the backbone but exclude any Hydrogen. The coordinate sets CB, CA, N, C, and O use the respective atom in the amino acid as the reference point. (B) As in (A), but using Voronoi Cell Volume instead of WCN. (C) As in (A), but the correlations are calculated with RSA instead of with ER. (D) As in (B), but the correlations are calculated with RSA instead of with ER.

References

    1. Overington J, Donnelly D, Johnson MS, Sali A, Blundell TL. Environment-specific amino acid substitution tables: tertiary templates and prediction of protein folds. Protein Sci. 1992;1:216–226. - PMC - PubMed
    1. Goldman N, Thorne JL, Jones DT. Assessing the impact of secondary structure and solvent accessibility on protein evolution. Genetics. 1998 May;149(1):445–458. - PMC - PubMed
    1. Mirny LA, Shakhnovich EI. Universally conserved positions in protein folds: reading evolutionary signals about stability, folding kinetics and function. J Mol Biol. 1999;291:177–196. - PubMed
    1. Bustamante CD, Townsend JP, Hartl DL. Solvent accessibility and purifying selection within proteins of escherichia coli and salmonella enterica. Molecular Biology and Evolution. 2000 Feb;17(2):301–308. - PubMed
    1. Conant GC, Stadler PF. Solvent exposure imparts similar selective pressures across a range of yeast proteins. Molecular Biology and Evolution. 2009 May;26(5):1155–1161. - PubMed

Publication types

LinkOut - more resources