Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Dec 28;3(12):e00492.
doi: 10.1016/j.heliyon.2017.e00492. eCollection 2017 Dec.

Amino acid repeats avert mRNA folding through conservative substitutions and synonymous codons, regardless of codon bias

Affiliations

Amino acid repeats avert mRNA folding through conservative substitutions and synonymous codons, regardless of codon bias

Sailen Barik. Heliyon. .

Abstract

A significant number of proteins in all living species contains amino acid repeats (AARs) of various lengths and compositions, many of which play important roles in protein structure and function. Here, I have surveyed select homopolymeric single [(A)n] and double [(AB)n] AARs in the human proteome. A close examination of their codon pattern and analysis of RNA structure propensity led to the following set of empirical rules: (1) One class of amino acid repeats (Class I) uses a mixture of synonymous codons, some of which approximate the codon bias ratio in the overall human proteome; (2) The second class (Class II) disregards the codon bias ratio, and appears to have originated by simple repetition of the same codon (or just a few codons); and finally, (3) In all AARs (including Class I, Class II, and the in-betweens), the codons are chosen in a manner that precludes the formation of RNA secondary structure. It appears that the AAR genes have evolved by orchestrating a balance between codon usage and mRNA secondary structure. The insights gained here should provide a better understanding of AAR evolution and may assist in designing synthetic genes.

Keywords: Bioinformatics; Computational biology; Genetics; Structural biology.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Synonymous codon patterns of selected single amino acid repeats (SAARs). The repeats were retrieved from the human proteome as described in Materials and Methods. In each example, the protein name (e.g. Fibrosin-1) or its HGNC (HUGO Gene Nomenclature Committee)-approved symbol (e.g. SKIDA1) is followed by the residue and its number of repeats (e.g. Ala19). Panels A, B, C, D, E show, respectively, representative repeats of Ala, Gln, Pro, miscellaneous amino acids, and Arg; panel F shows the color codes used for the repeat codons and percentage of each synonymous codon use for a given amino acid in the human proteome, acquired from the Codon Usage Database, http://www.kazusa.or.jp/codon/. Note that in panel F the percentage values for the six-codon amino acids, Ser and Arg, will not add up to 100, as two codons of each have been omitted to conserve space. The vast majority of examples are uninterrupted SAARs; in a few cases, the interrupting amino acids/codons are written overhead (when space permitted) and indicated by white color. Two relatively complex repeat runs, SPT20HL1 (panel A) and RUNX2 (panel B) are marked with apparent microrepeat units and expansions. In each panel, the repeats are listed from the most Class I type (i.e. diverse, multi-colored codons) to the most Class II type (i.e. identical, single-color codons) from top to bottom, by qualitative visual inspection.
Fig. 2
Fig. 2
Hypothetical perfect RNA hairpins formed in DAARs. Nucleotide (mRNA) sequences consisting of complementary codons (UCU, AGA) for the two types of DAARs are illustrated by Ser-Arg repeats, (S)5(R)5 and (SR)5. RNA structure prediction was conducted by the MFE method as described in Materials and Methods. In this display, the A:U base pair bonding is indicated by a single line, and G:C pair, by a double line. Note the very similar thermodynamic stability (ΔG) of the two structures, primarily because of identical base-pairs and lengths.
Fig. 3
Fig. 3
Schematic model of AAR formation. In this scheme (as in Fig. 1), the colored boxes indicate different but synonymous codons for the same amino acid of an AAR. All AARs may start with a short sequence of a few amino acids, operationally defined as a “monomeric unit”. The starting unit as well as the final repeat, produced by imperfect replication, may have various degrees of agreement or disagreement with the organismic codon bias. DNA repeats can promote excision of units (indicated by the shorter arrow at the recombination step), but this is minimized by the imperfection of the repeats, produced by conservative amino acid replacement and substitution with synonymous codons, which at the same time prevents formation of secondary structural folds in the mRNA. The resultant codon distribution may regulate translational kinetics and folding of the AAR protein, also contributing to the overall evolution of the AAR sequence.

Similar articles

Cited by

References

    1. Chavali S., Chavali P.L., Chalancon G., de Groot N.S., Gemayel R., Latysheva N.S., Ing-Simmons E., Verstrepen K.J., Balaji S., Babu M.M. Constraints and consequences of the emergence of amino acid repeats in eukaryotic proteins. Nat. Struct. Mol. Biol. 2017;24:765–777. - PMC - PubMed
    1. Mier P., Alanis-Lobato G., Andrade-Navarro M.A. Context characterization of amino acid homorepeats using evolution, position, and order. Proteins. 2017;85:709–719. - PubMed
    1. Andrade M.A., Perez-Iratxeta C., Ponting C.P. Protein repeats: structures, functions, and evolution. J. Struct. Biol. 2001;134:117–131. - PubMed
    1. Yoshimura S.H., Hirano T. HEAT repeats – versatile arrays of amphiphilic helices working in crowded environments? J Cell Sci. 2016;129:3963–3970. - PubMed
    1. Schüler A., Bornberg-Bauer E. Evolution of protein domain repeats in metazoa. Mol. Biol. Evol. 2016;33:3170–3182. - PMC - PubMed

LinkOut - more resources