Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Oct 1;9(10):2879-2892.
doi: 10.1093/gbe/evx191.

Net Evolutionary Loss of Residue Polarity in Drosophilid Protein Cores Indicates Ongoing Optimization of Amino Acid Composition

Affiliations

Net Evolutionary Loss of Residue Polarity in Drosophilid Protein Cores Indicates Ongoing Optimization of Amino Acid Composition

Lev Y Yampolsky et al. Genome Biol Evol. .

Abstract

Amino acid frequencies in proteins may not be at equilibrium. We consider two possible explanations for the nonzero net residue fluxes in drosophilid proteins. First, protein interiors may have a suboptimal residue composition and be under a selective pressure favoring stability, that is, leading to the loss of polar (and the gain of large) amino acids. One would then expect stronger net fluxes on the protein interior than at the exposed sites. Alternatively, if most of the polarity loss occurs at the exposed sites and the selective constraint on amino acid composition at such sites decreases over time, net loss of polarity may be neutral and caused by disproportionally high occurrence of polar residues at exposed, least constrained sites. We estimated net evolutionary fluxes of residue polarity and volume at sites with different solvent accessibility in conserved protein families from 12 species of Drosophila. Net loss of polarity, miniscule in magnitude, but consistent across all lineages, occurred at all sites except the most exposed ones, where net flux of polarity was close to zero or, in membrane proteins, even positive. At the intermediate solvent accessibility the net fluxes of polarity and volume were similar to neutral predictions, whereas much of the polarity loss not attributable to neutral expectations occurred at the buried sites. These observations are consistent with the hypothesis that residue composition in many proteins is structurally suboptimal and continues to evolve toward lower polarity in the protein interior, in particular in proteins with intracellular localization. The magnitude of polarity and volume changes was independent from the protein's evolutionary age, indicating that the approach to equilibrium has been slow or that no such single equilibrium exists.

Keywords: Drosophila; residue polarity; residue volume; selection; solvent accessibility; stability.

PubMed Disclaimer

Figures

<sc>Fig</sc>. 1.—
Fig. 1.—
Schematic representation of the two hypotheses for the existence of net polarity fluxes. (A) Proteins’ interior is under selection for lower average polarity; optimal composition has not been reached yet. (B) Recent relaxation of selective constraint on surface (and possibly core) sites; mutational equilibrium has not been reached yet. t, time; P, polarity; TB, birth of a protein; TR, relaxation of selective constraints; OC, optimum polarity for the protein core; OS, optimum polarity for the protein surface; EM, polarity at mutational equilibrium. Red, protein core; blue, protein surface.
<sc>Fig</sc>. 2.—
Fig. 2.—
(A) Phylogenetic tree of 12 Drosophila species with vertical position of nodes reflecting average polarity of amino acids of reconstructed proteins at this node. Vertical bars are standard errors caused by variability among proteins and sites. Species names: dwil = D. willingstoni, dana = D. ananassae, dmel = D. melanogaster, dsim = D. similans, dyak = D. yakuba, dsec = D. secchelia, dere = D. erecta, dpse = D. pseudoobscura, dper = D. persimilis, dgri = D. grimshawi, dmoj = D. mojavensis, dvir = D. virilis. Regression coefficient of mean polarity of extant or reconstructed protein sequences over age in nucleotide substitution units = −0.00405; P < 1E-8). (B) Mean change in residues’ volume (Å3, squares, left axis) and polarity (arbitrary units, circles, right axis). Vertical bars are approximate 95% CIs. Linear regressions for volume and polarity change, respectively: dVolume = 0.35 + 1.27×age (P > 0.09); dPolarity = −0.21 + 0.17×age (P < 0.006). Horizontal axis is the same on both (A) and (B) and represent time before present in nucleotide substitution units.
<sc>Fig</sc>. 3.—
Fig. 3.—
Mean net change in polarity (A, B; circles) and volume (C, D; squares) at sites with different relative solvent accessibility at neighboring sites (context RSA). Context RSA calculated either from I-TASSER-predicted (all proteins; A, C) or from experimentally determined solvent accessibility (64 proteins with structure data; B, D). Vertical bars are approximate 95% CIs. Neutral expectations: solid line, with no mutational biases; dashed line, with a 2-fold transition/transversion bias; dotted line, with 2-fold transition/transversion bias and 10-fold CpG bias.
<sc>Fig</sc>. 4.—
Fig. 4.—
Mean net change in polarity (AD) and volume (EH) at sites with different relative solvent accessibility at neighboring sites (context RSA) in proteins with intracellular (A and E), membrane (B and F), extracellular (C and G), and unknown (D and H) cellular localization according to FlyBase GO annotations. Note different scales. Symbols, lines, and error bars as on figure 3.
<sc>Fig</sc>. 5.—
Fig. 5.—
The relationship between protein family age (approximate Myr, log scale) and mean net change in polarity. Black dots: all substitutions; small red dots: protein interiors (I-TASSER-estimated context RSA 0–0.2); medium purple circles: intermediate sites (context RSA 0.3–0.5); large blue circles: exterior sites (context RSA > 0.5). Horizontal bars represent ranges of age estimate (supplementary table S1, Supplementary Material online). Vertical bars are approximate 95% CIs. Linear regression coefficients of mean dPolarity over log-transformed protein age: −0.042 (P > 0.40), 0.012 (P > 0.88), −0.066 (P > 0.28), and −0.082 (P > 0.81) for all, buried, intermediate, and surface sites, respectively. Error bar only shown for all sites data points for clarity.
<sc>Fig</sc>. 6.—
Fig. 6.—
(A) Principal components analysis of 880 Drosophila melanogaster proteins in the analyzed data set by relative occurrence of 20 amino acids. Blue dots, intracellular proteins; purple dots, extracellular proteins; yellow dots, membrane proteins. Proteins with unknown cellular localization are included in the analysis, but not shown. (B) Principal components analysis of 905 Drosophila spp. proteins in the analyzed data set by net gains/losses of 20 amino acids.

Similar articles

Cited by

References

    1. Baldwin RL. 2007. Energetics of protein folding. J Mol Biol. 371(2):283–301. - PubMed
    1. Bigelow CC. 1967. On the average hydrophobicity of proteins and the relation between it and protein structure. J Theor Biol. 16(2):187–211. - PubMed
    1. Bloom JD, Raval A, Wilke CO.. 2007. Thermodynamics of neutral protein evolution. Genetics 175(1):255–266. - PMC - PubMed
    1. Bueno M, Campos LA, Estrada J, Sancho J.. 2006. Energetics of aliphatic deletions in protein cores. Protein Sci. 15(8):1858–1872. - PMC - PubMed
    1. Cambillau C, Claverie JM.. 2000. Structural and genomic correlates of hyperthermostability. J Biol Chem. 275(42):32383–32386. - PubMed

Publication types

LinkOut - more resources