. 2021 Apr 30:8:626729.

doi: 10.3389/fmolb.2021.626729. eCollection 2021.

Abundance Imparts Evolutionary Constraints of Similar Magnitude on the Buried, Surface, and Disordered Regions of Proteins

Benjamin Dubreuil¹, Emmanuel D Levy¹

Affiliations

PMID: 33996892
PMCID: PMC8119896
DOI: 10.3389/fmolb.2021.626729

Abundance Imparts Evolutionary Constraints of Similar Magnitude on the Buried, Surface, and Disordered Regions of Proteins

Benjamin Dubreuil et al. Front Mol Biosci. 2021.

. 2021 Apr 30:8:626729.

doi: 10.3389/fmolb.2021.626729. eCollection 2021.

Authors

Benjamin Dubreuil¹, Emmanuel D Levy¹

Affiliation

¹ Department of Structural Biology, Weizmann Institute of Science, Rehovot, Israel.

PMID: 33996892
PMCID: PMC8119896
DOI: 10.3389/fmolb.2021.626729

Abstract

An understanding of the forces shaping protein conservation is key, both for the fundamental knowledge it represents and to allow for optimal use of evolutionary information in practical applications. Sequence conservation is typically examined at one of two levels. The first is a residue-level, where intra-protein differences are analyzed and the second is a protein-level, where inter-protein differences are studied. At a residue level, we know that solvent-accessibility is a prime determinant of conservation. By inverting this logic, we inferred that disordered regions are slightly more solvent-accessible on average than the most exposed surface residues in domains. By integrating abundance information with evolutionary data within and across proteins, we confirmed a previously reported strong surface-core association in the evolution of structured regions, but we found a comparatively weak association between disordered and structured regions. The facts that disordered and structured regions experience different structural constraints and evolve independently provide a unique setup to examine an outstanding question: why is a protein's abundance the main determinant of its sequence conservation? Indeed, any structural or biophysical property linked to the abundance-conservation relationship should increase the relative conservation of regions concerned with that property (e.g., disordered residues with mis-interactions, domain residues with misfolding). Surprisingly, however, we found the conservation of disordered and structured regions to increase in equal proportion with abundance. This observation implies that either abundance-related constraints are structure-independent, or multiple constraints apply to different regions and perfectly balance each other.

Keywords: contact number; intrinsic disorder; misfolding; misinteraction; protein abundance; protein evolution; protein structure; yeast proteome.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

**FIGURE 1**
The evolutionary rate of disordered regions is comparable to that of super-exposed regions in folded proteins. **(A)** Evolutionary information and structural features are mapped onto protein sequences from *S. cerevisiae*. The minimap represents the multiple sequence alignment of orthologous sequences to STI1. The amino acids are colored using CLUSTAL’s color scale (Thompson et al., 1994) depending on residue type and conservation. The zoomed-in panel illustrates residue-level conservation, which we calculated with Rate4Site (Pupko et al., 2002). We mapped the positions of PFAM (Bateman et al., 2002) and SUPERFAMILY (Gough and Chothia, 2002) domains (gray box), and of disordered regions predicted by IUPRED (Dosztányi, 2018) (cyan ribbon). We also mapped structural information available from PDB (Rose et al., 2017; Armstrong et al., 2019) and 3DComplex (Levy et al., 2006) on sequences. For this particular sequence, structural information was partially available based on PDB code 3UQ3 (Schmid et al., 2012). **(B)** Within proteins, the evolutionary rate of residues in different regions are averaged, and we compare the ratio of these averages. We show the median of ratios with error bars corresponding to the median absolute deviation. Surface and buried residues are defined based on relative ASA of >25 and ≤25%, respectively (Levy, 2010). **(C)** We calculate the same ratio as in panel **(B)**, between disordered regions and surface regions, using an increasingly stringent relative ASA cut-off to define surface residues. As we increase the cutoff, the median ratio tends toward 1, which highlights that disordered residues evolve only slightly faster than the most exposed residues at protein surfaces.

**FIGURE 2**
The correlation in the conservation of disorder vs domain regions is poor among low abundance proteins and increases with abundance. **(A)** The top row shows the average evolutionary rate (ER) of surface residues (x-axis) vs buried residues (y-axis) per protein, for two classes of abundance (0–3 and 3–18 ppm or parts per millions). The lower row shows the average ER of disordered residues (x-axis) vs residues in domains (y-axis) per protein, for the same two classes of abundance. A protein falling on the diagonal (dashed line) means that residues in the two regions being compared have equal evolutionary rates (i.e., a ratio of 1). The Spearman rank correlation coefficient (r), the associated p-value (p, two-sided Spearman’s rank correlation test), and the number of proteins (n) within each class of abundance are given above each scatterplot. **(B)** Same as in panel **(A)**, for three classes of abundance (18–59, 59–352, and 352–21,866 ppm or parts per million).

**FIGURE 3**
The relative evolutionary rates of different protein regions are steady with abundance. Distribution of evolutionary rates ratio between different regions in the sequence (y-axis), across five classes of protein abundance (x-axis). A ratio is calculated by dividing the average evolutionary rate of residues found in two regions panel **(A)** surface vs. buried, panel **(B)** disorder vs. domain. The white dashed line highlights the median ratio across bins of abundance. Overlaid box plots show the interquartile range (IQR = 25 to 75% quantiles) with their whiskers extending to 1.58 × IQR. Beyond this interval, the three most extreme outlier values are annotated. The number of proteins contributing to each distribution is given. We also highlight the relative rates for a pair of proteins, one with low and one with high abundance (STI1 and DBF4). These two proteins show comparable structural features, different evolutionary rates (respectively, 0.575 and 1.34 for their full sequence), and similar ratios.

**FIGURE 4**
Evolutionary rates of different regions and their ratio as a function of abundance. **(A)** Evolutionary rates (y-axis) as a function of protein abundance (x-axis) for surface regions, full-length structures, and buried regions. The ratio of evolutionary rate for surface vs buried regions is also shown as a function of abundance. Contour lines show the density of points. The median evolutionary rate and median protein abundance are shown by a vertical and horizontal line, respectively. The Spearman rank correlation coefficient and p-value are given with the number of proteins in each dataset. A black line shows the fitted sigmoidal regression for each plot. We highlight two proteins, one with a low and one with a high abundance (DBF4 and STI1). Both have comparable structural features but different evolutionary rates. **(B)** Same representation as in panel **(A)**, now considering disordered versus domain regions.

**FIGURE 5**
Pairwise sequence identity across orthologs pairs. For each orthogroup we calculate the average percent sequence-identity using all ortholog pairs or only pairs that include the *S. cerevisiae* protein. The distribution for these two measures are shown with dark and light blue, respectively. Vertical lines highlight the median. The number of orthogroups is 3,798.

See this image and copyright information in PMC

Cited by

A Conserved Core Region of the Scaffold NEMO is Essential for Signal-induced Conformational Change and Liquid-liquid Phase Separation.
DiRusso CJ, DeMaria AM, Wong J, Jordanides JJ, Whitty A, Allen KN, Gilmore TD. DiRusso CJ, et al. bioRxiv [Preprint]. 2023 May 25:2023.05.25.542299. doi: 10.1101/2023.05.25.542299. bioRxiv. 2023. Update in: J Biol Chem. 2023 Dec;299(12):105396. doi: 10.1016/j.jbc.2023.105396. PMID: 37292615 Free PMC article. Updated. Preprint.
A conserved core region of the scaffold NEMO is essential for signal-induced conformational change and liquid-liquid phase separation.
DiRusso CJ, DeMaria AM, Wong J, Wang W, Jordanides JJ, Whitty A, Allen KN, Gilmore TD. DiRusso CJ, et al. J Biol Chem. 2023 Dec;299(12):105396. doi: 10.1016/j.jbc.2023.105396. Epub 2023 Oct 27. J Biol Chem. 2023. PMID: 37890781 Free PMC article.
Substitution Models of Protein Evolution with Selection on Enzymatic Activity.
Ferreiro D, Khalil R, Sousa SF, Arenas M. Ferreiro D, et al. Mol Biol Evol. 2024 Feb 1;41(2):msae026. doi: 10.1093/molbev/msae026. Mol Biol Evol. 2024. PMID: 38314876 Free PMC article.

References

1. Akashi H. (2003). Translational selection and yeast proteome evolution. Genetics 164 1291–1303. - PMC - PubMed
1. Armstrong D. R., Berrisford J. M., Conroy M. J., Gutmanas A., Anyango S., Choudhary P., et al. (2019). PDBe: improved findability of macromolecular structure data in the PDB. Nucleic Acids Res. 48 D335–D343. - PMC - PubMed
1. Banani S. F., Lee H. O., Hyman A. A., Rosen M. K. (2017). Biomolecular condensates: organizers of cellular biochemistry. Nat. Rev. Mol. Cell Biol. 18 285–298. - PMC - PubMed
1. Bateman A., Birney E., Cerruti L., Durbin R., Etwiller L., Eddy S. R., et al. (2002). The Pfam protein families database. Nucleic Acids Res. 30 276–280. - PMC - PubMed
1. Bellay J., Han S., Michaut M., Kim T., Costanzo M., Andrews B. J., et al. (2011). Bringing order to protein disorder through comparative genomics and genetic interactions. Genome Biol. 12:R14. - PMC - PubMed

Associated data

figshare/10.6084/m9.figshare.13738657

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Molecular Biology Databases
- Saccharomyces Genome Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Abundance Imparts Evolutionary Constraints of Similar Magnitude on the Buried, Surface, and Disordered Regions of Proteins

Affiliation

Abundance Imparts Evolutionary Constraints of Similar Magnitude on the Buried, Surface, and Disordered Regions of Proteins

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Associated data

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases