Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Mar 17;117(11):5907-5912.
doi: 10.1073/pnas.1911203117. Epub 2020 Mar 3.

Frameshifting preserves key physicochemical properties of proteins

Affiliations

Frameshifting preserves key physicochemical properties of proteins

Lukas Bartonek et al. Proc Natl Acad Sci U S A. .

Abstract

Frameshifts in protein coding sequences are widely perceived as resulting in either nonfunctional or even deleterious protein products. Indeed, frameshifts typically lead to markedly altered protein sequences and premature stop codons. By analyzing complete proteomes from all three domains of life, we demonstrate that, in contrast, several key physicochemical properties of protein sequences exhibit significant robustness against +1 and -1 frameshifts. In particular, we show that hydrophobicity profiles of many protein sequences remain largely invariant upon frameshifting. For example, over 2,900 human proteins exhibit a Pearson's correlation coefficient R between the hydrophobicity profiles of the original and the +1-frameshifted variants greater than 0.7, despite an average sequence identity between the two of only 6.5% in this group. We observe a similar effect for protein sequence profiles of affinity for certain nucleobases as well as protein sequence profiles of intrinsic disorder. Finally, analysis of significance and optimality demonstrates that frameshift stability is embedded in the structure of the universal genetic code and may have contributed to shaping it. Our results suggest that frameshifting may be a powerful evolutionary mechanism for creating new proteins with vastly different sequences, yet similar physicochemical properties to the proteins from which they originate.

Keywords: evolution; frameshift; genetic code; hydrophobicity.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interest.

Figures

Fig. 1.
Fig. 1.
Frameshifting at the level of the universal genetic code (UGC). (A) Histogram of Pearson’s correlation coefficients R for UGC vs. its frameshifted version for all 604 scales investigated. Scales are grouped by category (alpha: α and turn propensity; beta: β propensity; hydro: hydrophobicity; nuc: nucleobase affinity; other) and presented as a stacked, normalized histogram. The expected density derived via a random model is shown as a dashed line, and the highest achievable Pearson’s R obtained for computationally optimized scales is marked by an arrow. (B) First panel: Pearson’s Rs for UGC with the associated P values, select scales indicated. Other panels: P values of 604 studied scales grouped by category with Gaussian jitter added along the x-dimension to separate the data points. (C) Clustering of scales according to optimal frameshift stability: The 604 studied scales were transformed into the space defined by the PCA of scales computationally optimized for frameshift stability at the UGC level. The first and second principal components of this space (PC1 and PC2) account for almost 100% of explained variance of optimal scales, which therefore lie on a circle (gray, dashed). The 604 physicochemical scales are shown as individual dots whose sizes reflect the negative logarithm of their P values as in B. Please note how scales with significant frameshift stability tend to cluster in the vicinity of scales with optimal frameshift stability. The arrows correspond to relative contributions of the respective amino acids to PC1 and PC2.
Fig. 2.
Fig. 2.
Frameshifting at the level of protein sequences. (A) Comparison of wild-type and +1 (Upper, red) and −1 (Lower, blue) frameshifted Factor 1 hydrophobicity profiles for Ser/Thr phosphatase 4 regulatory subunit 4 protein (UniProtID Q6NUP7) with the associated Pearson’s Rs. (B) Distributions of Pearson’s R for wild-type vs. +1 frameshift (red) and wild-type vs. −1 frameshift (blue) for Factor 1 over all human proteins (n = 17,083) with their medians indicated. (C) Histogram of median Pearson’s R for 604 scales when comparing wild-type and +1 frameshifted profiles in human for all investigated scales, grouped by category and presented as a stacked, normalized histogram. The expected density derived via a random model is shown as a dashed line. (D) Comparison of P values for frameshifting at the level of UGC and +1 frameshifted human sequences for 604 studied scales. (E) Enrichment of GO cellular compartment (CC) terms in the top quartile of human sequences according to Pearson’s Rs between wild-type and +1 or −1 frameshifted Factor 1 profiles (low P values: light green; high P values: dark green). (F) Comparison of Pearson’s Rs (wild-type vs. +1 frameshifted Factor 1 profiles) between orthologous proteins in H. sapiens and M. musculus (n = 12,174; R = 0.74). The same is shown for −1 frameshifts in the Inset (n = 12,174; R = 0.71).
Fig. 3.
Fig. 3.
Robustness of a membrane protein’s hydrophobicity profile against frameshifting. (A) Factor 1 hydrophobicity profiles of wild-type sodium/potassium/calcium exchanger (UniProtID O60721) and its +1 frameshifted variant with relevant regions indicated with dashed lines. Close-up of the profiles in the first (B) and the second (C) transmembrane domains of the protein. Note that the specific locations of transmembrane helices are matched in all cases but one. (D) Comparison of wild-type and +1 frameshifted sequences in a region outside the transmembrane domains together with the associated Factor 1 profiles. (E) Inversion of the charge pattern upon +1 frameshift with a retained hydrophobicity profile.
Fig. 4.
Fig. 4.
Frameshifting in the context of GUA affinity and intrinsic disorder. (A) Distributions of Pearson’s R between GUA-affinity profiles of wild-type and +1 or −1 frameshifted human protein sequences (n = 17,083). (B) RNA vs. protein: Distributions of Pearson’s R between mRNA PUR-density profiles and autologous protein’s GUA-affinity profiles in human for wild-type, +1 and −1 frameshifted sequences (n = 17,083) with medians indicated. Note that matched profiles are indicated by negative Pearson’s Rs due to the standard definition of GUA-affinity scales. (C) RNA vs. protein: Comparison of an mRNA PUR-density profile and protein GUA-affinity profile for the −1 frameshifted sequence of the nuclear RNA export factor (UniProtID: Q9GZY0). (D) Comparison of disorder values averaged over full sequences (avg. disorder) in wild-type (WT) and +1 (Left, red) or −1 (Right, blue) frameshifted sequences. (E) Example IUPRED (33) intrinsic disorder profiles of a wild-type protein and its +1 shift variant (UniProtID: P07093).
Fig. 5.
Fig. 5.
Evolution of protein sequences via frameshifting. (A) Frameshifts enable major jumps in protein sequence space with little change in key physicochemical properties like hydrophobicity. (B) An insertion or deletion (INDEL) results in a frameshifted gene with potential premature stop codons. These can be removed by either single point mutations or another INDEL-induced frameshift. AUG: start codon.

References

    1. Stenson P. D., et al. , Human Gene Mutation Database (HGMD): 2003 update. Hum. Mutat. 21, 577–581 (2003). - PubMed
    1. Mertins P., et al. , Proteogenomics connects somatic mutations to signalling in breast cancer. Nature 534, 55–62 (2016). - PMC - PubMed
    1. Garcia-Diaz M., Kunkel T. A., Mechanism of a genetic glissando: Structural biology of indel mutations. Trends Biochem. Sci. 31, 206–214 (2006). - PubMed
    1. Maki H., Origins of spontaneous mutations: Specificity and directionality of base-substitution, frameshift, and sequence-substitution mutageneses. Annu. Rev. Genet. 36, 279–303 (2002). - PubMed
    1. Hu J., Ng P. C., Predicting the effects of frameshifting indels. Genome Biol. 13, R9 (2012). - PMC - PubMed

Publication types

LinkOut - more resources