Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 May 8;9(10):511-7.
doi: 10.6026/97320630009511. Print 2013.

Sequences encoding identical peptides for the analysis and manipulation of coding DNA

Affiliations

Sequences encoding identical peptides for the analysis and manipulation of coding DNA

Joaquín Sánchez. Bioinformation. .

Abstract

The use of sequences encoding identical peptides (SEIP) for the in silico analysis of coding DNA from different species has not been reported; the study of such sequences could directly reveal properties of coding DNA that are independent of peptide sequences. For practical purposes SEIP might also be manipulated for e.g. heterologous protein expression. We extracted 1,551 SEIP from human and E. coli and 2,631 SEIP from human and D. melanogaster. We then analyzed codon usage and intercodon dinucleotide tendencies and found differences in both, with more conspicuous disparities between human and E. coli than between human and D. melanogaster. We also briefly manipulated SEIP to find out if they could be used to create new coding sequences. We hence attempted replacement of human by E. coli codons via dicodon exchange but found that full replacement was not possible, this indicated robust species-specific dicodon tendencies. To test another form of codon replacement we isolated SEIP from human and the jellyfish green fluorescent protein (GFP) and we then re-constructed the GFP coding DNA with human tetra-peptide-coding sequences. Results provide proof-of-principle that SEIP may be used to reveal differences in the properties of coding DNA and to reconstruct in pieces a protein coding DNA with sequences from a different organism, the latter might be exploited in heterologous protein expression.

Keywords: codon allocation tendencies; codon pairs; green fluorescent protein; heterologous protein expression; intercodon dinucleotides; synonymous codons.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Scheme depicting the procedure to replace GFP codons by human codons using human sequences encoding identical tetrapeptides. Roman numerals in parentheses are used to indicate the sequence of the process. Drawings are not to scale. The horizontal bars represent either the human ORFeome (red bar on top) as indicated, or the collection of human proteins (blue) upon ORFeome translation (Roman numeral I). The green horizontal bar represents the GFP protein. As indicated, a segmented green horizontal bar is used to represent 59 tetrapeptides integrating GFP, except for the last two amino acids. The red squares below the segmented GFP green bar represent the multiple human sequences used to reconstruct the GFP coding sequence after defining a consensus (Roman numeral V) for each tetrapeptide-coding sequence.
Figure 2
Figure 2
Codon frequencies in sets of sequences encoding identical peptides. Human sequences were different in A and B. The origin of codons is indicated above each graph. The x-axis shows codon sequences and the corresponding encoded amino acid (one letter code). In both panels neither methionine nor tryptophan codons are shown as they had identical frequency because they are encoded by a single codon each. In the y-axis absolute codon frequencies are shown.
Figure 3
Figure 3
Procedure used to exchange synonymous codons in sequences. In the drawing the exchange of synonymous codons is represented. Arrows indicate that for each peptide-coding sequence the exchange of synonymous codons is allowed internally but not with sequences encoding other peptides (symbol forbidden over arrows).
Figure 4
Figure 4
Percent increment or decrement in intercodon dinucleotide frequencies after shuffling of synonymous codons in sequences encoding identical peptides in human and E. coli (A) or in human and D. melanogaster (B). In the x-axis intercodon dinucleotide sequences are shown. In the y-axis the change in percentage in intercodon dinucleotide frequency is shown. Standard deviations are shown above bars.
Figure 5
Figure 5
Procedure used to replace human for E. coil codons by exchange of pairs of synonymous codons in sequences coding for identical peptides. The exchange occurs between one copy of the human sequences (sequence on top) and 68 copies of sequences from E. coli (symbolized by two sequences in the bottom). The dotted line indicates the boundary between independent peptides. Arrows indicate the exchange of pairs of synonymous codons. The forbidden symbol over arrows indicates changes that are not allowed. Codon exchange downwards, i.e. towards E. coli sequences, is not shown because its effects are virtually irrelevant due to the disparity in the number of copies.
Figure 6
Figure 6
Codon composition of human sequences encoding identical peptides before and after shuffling either as pairs of synonymous codons or as individual codons for replacement of human codons by those in E. coli. In the x-axis codon sequence and encoded amino acid one letter code) are shown. The y-axis shows absolute frequency. Above bars of shuffled sequences standard deviations are shown. For comparison, codon compositions of intact E. coli sequences are shown.
Figure 7
Figure 7
Codon usage in GFP Aequorea victoria coding DNA after reconstruction with human coding sequence. Codon usage (Y axis) is expressed as fraction of the unit. In the X-axis codons and corresponding encoded amino acid (one letter code) are shown. Positions where bars are missing indicate absence of that codon either in reconstructed GFP and / or in the original GFP. Codons for Met (ATG), TRP (TGG) and stop codon are omitted. To ease visual appreciation comparisons between converted GFP (GFP humanized) and human and GFP codon usage, we show in panel A the comparison between GFP humanized and human while in panel B we show the comparison between GFP humanized and GFP.
Figure 8
Figure 8
Intercodon dinucleotide frequencies in GFP Aequorea victoria coding DNA after reconstruction with human coding sequences. In the x-axis intercodon dinucleotide sequences are shown. In the y-axis intercodon dinucleotide frequency is given as fraction. To ease visual appreciation of comparisons between converted GFP (GFP humanized) and human and GFP, we show in panel (A) the comparison between GFP humanized and human and in panel (B) the comparison between GFP humanized and GFP.

Similar articles

References

    1. Plotkin J, Kudla G. Nat Rev Genet. 2011;12:32. - PMC - PubMed
    1. Ikemura T. Mol Biol Evol. 1985;2:13. - PubMed
    1. Sharp PM, et al. Nucleic Acids Res. 1988;16:8207. - PMC - PubMed
    1. Hershberg R, Petrov DA. PLoS Genet. 2009;5:e1000556. - PMC - PubMed
    1. Irwin B, et al. J Biol Chem. 1995;270:22801. - PubMed

LinkOut - more resources