Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Apr 30:8:626729.
doi: 10.3389/fmolb.2021.626729. eCollection 2021.

Abundance Imparts Evolutionary Constraints of Similar Magnitude on the Buried, Surface, and Disordered Regions of Proteins

Affiliations

Abundance Imparts Evolutionary Constraints of Similar Magnitude on the Buried, Surface, and Disordered Regions of Proteins

Benjamin Dubreuil et al. Front Mol Biosci. .

Abstract

An understanding of the forces shaping protein conservation is key, both for the fundamental knowledge it represents and to allow for optimal use of evolutionary information in practical applications. Sequence conservation is typically examined at one of two levels. The first is a residue-level, where intra-protein differences are analyzed and the second is a protein-level, where inter-protein differences are studied. At a residue level, we know that solvent-accessibility is a prime determinant of conservation. By inverting this logic, we inferred that disordered regions are slightly more solvent-accessible on average than the most exposed surface residues in domains. By integrating abundance information with evolutionary data within and across proteins, we confirmed a previously reported strong surface-core association in the evolution of structured regions, but we found a comparatively weak association between disordered and structured regions. The facts that disordered and structured regions experience different structural constraints and evolve independently provide a unique setup to examine an outstanding question: why is a protein's abundance the main determinant of its sequence conservation? Indeed, any structural or biophysical property linked to the abundance-conservation relationship should increase the relative conservation of regions concerned with that property (e.g., disordered residues with mis-interactions, domain residues with misfolding). Surprisingly, however, we found the conservation of disordered and structured regions to increase in equal proportion with abundance. This observation implies that either abundance-related constraints are structure-independent, or multiple constraints apply to different regions and perfectly balance each other.

Keywords: contact number; intrinsic disorder; misfolding; misinteraction; protein abundance; protein evolution; protein structure; yeast proteome.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

FIGURE 1
FIGURE 1
The evolutionary rate of disordered regions is comparable to that of super-exposed regions in folded proteins. (A) Evolutionary information and structural features are mapped onto protein sequences from S. cerevisiae. The minimap represents the multiple sequence alignment of orthologous sequences to STI1. The amino acids are colored using CLUSTAL’s color scale (Thompson et al., 1994) depending on residue type and conservation. The zoomed-in panel illustrates residue-level conservation, which we calculated with Rate4Site (Pupko et al., 2002). We mapped the positions of PFAM (Bateman et al., 2002) and SUPERFAMILY (Gough and Chothia, 2002) domains (gray box), and of disordered regions predicted by IUPRED (Dosztányi, 2018) (cyan ribbon). We also mapped structural information available from PDB (Rose et al., 2017; Armstrong et al., 2019) and 3DComplex (Levy et al., 2006) on sequences. For this particular sequence, structural information was partially available based on PDB code 3UQ3 (Schmid et al., 2012). (B) Within proteins, the evolutionary rate of residues in different regions are averaged, and we compare the ratio of these averages. We show the median of ratios with error bars corresponding to the median absolute deviation. Surface and buried residues are defined based on relative ASA of >25 and ≤25%, respectively (Levy, 2010). (C) We calculate the same ratio as in panel (B), between disordered regions and surface regions, using an increasingly stringent relative ASA cut-off to define surface residues. As we increase the cutoff, the median ratio tends toward 1, which highlights that disordered residues evolve only slightly faster than the most exposed residues at protein surfaces.
FIGURE 2
FIGURE 2
The correlation in the conservation of disorder vs domain regions is poor among low abundance proteins and increases with abundance. (A) The top row shows the average evolutionary rate (ER) of surface residues (x-axis) vs buried residues (y-axis) per protein, for two classes of abundance (0–3 and 3–18 ppm or parts per millions). The lower row shows the average ER of disordered residues (x-axis) vs residues in domains (y-axis) per protein, for the same two classes of abundance. A protein falling on the diagonal (dashed line) means that residues in the two regions being compared have equal evolutionary rates (i.e., a ratio of 1). The Spearman rank correlation coefficient (r), the associated p-value (p, two-sided Spearman’s rank correlation test), and the number of proteins (n) within each class of abundance are given above each scatterplot. (B) Same as in panel (A), for three classes of abundance (18–59, 59–352, and 352–21,866 ppm or parts per million).
FIGURE 3
FIGURE 3
The relative evolutionary rates of different protein regions are steady with abundance. Distribution of evolutionary rates ratio between different regions in the sequence (y-axis), across five classes of protein abundance (x-axis). A ratio is calculated by dividing the average evolutionary rate of residues found in two regions panel (A) surface vs. buried, panel (B) disorder vs. domain. The white dashed line highlights the median ratio across bins of abundance. Overlaid box plots show the interquartile range (IQR = 25 to 75% quantiles) with their whiskers extending to 1.58 × IQR. Beyond this interval, the three most extreme outlier values are annotated. The number of proteins contributing to each distribution is given. We also highlight the relative rates for a pair of proteins, one with low and one with high abundance (STI1 and DBF4). These two proteins show comparable structural features, different evolutionary rates (respectively, 0.575 and 1.34 for their full sequence), and similar ratios.
FIGURE 4
FIGURE 4
Evolutionary rates of different regions and their ratio as a function of abundance. (A) Evolutionary rates (y-axis) as a function of protein abundance (x-axis) for surface regions, full-length structures, and buried regions. The ratio of evolutionary rate for surface vs buried regions is also shown as a function of abundance. Contour lines show the density of points. The median evolutionary rate and median protein abundance are shown by a vertical and horizontal line, respectively. The Spearman rank correlation coefficient and p-value are given with the number of proteins in each dataset. A black line shows the fitted sigmoidal regression for each plot. We highlight two proteins, one with a low and one with a high abundance (DBF4 and STI1). Both have comparable structural features but different evolutionary rates. (B) Same representation as in panel (A), now considering disordered versus domain regions.
FIGURE 5
FIGURE 5
Pairwise sequence identity across orthologs pairs. For each orthogroup we calculate the average percent sequence-identity using all ortholog pairs or only pairs that include the S. cerevisiae protein. The distribution for these two measures are shown with dark and light blue, respectively. Vertical lines highlight the median. The number of orthogroups is 3,798.

Similar articles

Cited by

References

    1. Akashi H. (2003). Translational selection and yeast proteome evolution. Genetics 164 1291–1303. - PMC - PubMed
    1. Armstrong D. R., Berrisford J. M., Conroy M. J., Gutmanas A., Anyango S., Choudhary P., et al. (2019). PDBe: improved findability of macromolecular structure data in the PDB. Nucleic Acids Res. 48 D335–D343. - PMC - PubMed
    1. Banani S. F., Lee H. O., Hyman A. A., Rosen M. K. (2017). Biomolecular condensates: organizers of cellular biochemistry. Nat. Rev. Mol. Cell Biol. 18 285–298. - PMC - PubMed
    1. Bateman A., Birney E., Cerruti L., Durbin R., Etwiller L., Eddy S. R., et al. (2002). The Pfam protein families database. Nucleic Acids Res. 30 276–280. - PMC - PubMed
    1. Bellay J., Han S., Michaut M., Kim T., Costanzo M., Andrews B. J., et al. (2011). Bringing order to protein disorder through comparative genomics and genetic interactions. Genome Biol. 12:R14. - PMC - PubMed

LinkOut - more resources