Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Jul 20:11:383.
doi: 10.1186/1471-2105-11-383.

The oligodeoxynucleotide sequences corresponding to never-expressed peptide motifs are mainly located in the non-coding strand

Affiliations

The oligodeoxynucleotide sequences corresponding to never-expressed peptide motifs are mainly located in the non-coding strand

Giovanni Capone et al. BMC Bioinformatics. .

Abstract

Background: We study the usage of specific peptide platforms in protein composition. Using the pentapeptide as a unit of length, we find that in the universal proteome many pentapeptides are heavily repeated (even thousands of times), whereas some are quite rare, and a small number do not appear at all. To understand the physico-chemical-biological basis underlying peptide usage at the proteomic level, in this study we analyse the energetic costs for the synthesis of rare and never-expressed versus frequent pentapeptides. In addition, we explore residue bulkiness, hydrophobicity, and codon number as factors able to modulate specific peptide frequencies. Then, the possible influence of amino acid composition is investigated in zero- and high-frequency pentapeptide sets by analysing the frequencies of the corresponding inverse-sequence pentapeptides. As a final step, we analyse the pentadecamer oligodeoxynucleotide sequences corresponding to the never-expressed pentapeptides.

Results: We find that only DNA context-dependent constraints (such as oligodeoxynucleotide sequence location in the minus strand, introns, pseudogenes, frameshifts, etc.) provide a coherent mechanistic platform to explain the occurrence of never-expressed versus frequent pentapeptides in the protein world.

Conclusions: This study is of importance in cell biology. Indeed, the rarity (or lack of expression) of specific 5-mer peptide modules implies the rarity (or lack of expression) of the corresponding n-mer peptide sequences (with n < 5), so possibly modulating protein compositional trends. Moreover the data might further our understanding of the role exerted by rare pentapeptide modules as critical biological effectors in protein-protein interactions.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Correlation between frequency distribution and standard heat of peptide formation of the 400 dipeptides present in the protein world. Panel A: The frequency distribution of the 400 dipeptides present in the protein world widely varies from a maximum of 21,927,296 times (LL dipeptide) to a minimum of 407,573 (WC dipeptide). Panel B: The heat of formation in kJ/mol as determined by the Spartan'06 software for the 400 dipeptides varies widely from the highly exothermic value of DE dipeptide formation (-944.34 kJ/mol) to the endothermic CP dipeptide formation (1062.47 kJ/mol). Mean ΔG° values equal to -219.06 ± 237.83 kJ/mol and 56.34 ± 249.51 kJ/mol characterize the 50 most frequent dipeptides and the 50 less frequent ones, respectively.
Figure 2
Figure 2
Location of the 5-mer sets selected for physico-chemical analyses along the distribution curve of pentapeptide frequencies in the universal proteome. UniRef100, the most comprehensive protein dataset available [[16-18], see also http://www.ebi.ac.uk/uniref/], was used. The arrows, lettered from b to k, indicate the frequencies of the different 5-mer sets corresponding, in the order, to 1, 4, 5, 50, 100, 341, 500, 1000, 1368 and 2500 occurrences and selected for physico-chemical analyses. A further set a, corresponding to the set of never-occurring pentapeptides, was also chosen.
Figure 3
Figure 3
Energetic cost of pentapeptides with different frequencies in the universal proteome. Panels A to E indicate pentapeptide sets that, in the universal proteome: A) are absent, B) are expressed only once, C) occur 100 times, D) occur 341 times, and E) occur 2500 times.
Figure 4
Figure 4
Statistical characterization of the energetic cost of pentapeptide sets with different frequencies in the universal proteome. The boxplots show the distribution of ΔG° values for each set of pentapeptides. The line within each box represents the median value. The top and bottom of each box represent the 75th and 25th percentile, respectively. The whiskers show the range of values that are not considered to be outliers. Outliers are plotted individually as plus signs. The p-value was 0.008, indicating that the means of the different sets are different, though clearly the magnitude of the differences is small.
Figure 5
Figure 5
Statistical characterization of hydrophobicity (A), bulkiness (B), and amino acid codon number (C) for pentapeptide sets with different frequencies in the universal proteome. The boxplots show the distribution of the values of each physico-biochemical factor for each set of pentapeptides. The line within each box represents the median value. The top and bottom of each box represent the 75th and 25th percentile, respectively. The whiskers show the range of values that are not considered to be outliers. Outliers are plotted individually as plus signs. The p-values among the different classes of 5-mers for hydrophobicity, bulkiness, and amino acid codon number were all less than 0.001, indicating in each case that the means of the different sets are different.
Figure 6
Figure 6
Effect of amino acid composition on pentapeptide frequency in the universal proteome. Frequency of: A) pentapeptides never-expressed in the universal proteome, and B) their inverse sequences. Frequency of: C) pentapeptides with 2500 occurrences in the proteome, and D) their inverse sequences.

References

    1. Lucchese G, Stufano A, Trost B, Kusalik A, Kanduc D. Peptidology: short amino acid modules in cell biology and immunology. Amino Acids. 2007;33:703–707. doi: 10.1007/s00726-006-0458-z. - DOI - PubMed
    1. Kanduc D, Capone G, Delfino VP, Losa G. The fractal dimension of protein information. Adv Stud Biol. 2010;2:53–62.
    1. Kanduc D, Lucchese A, Mittelman A. Individuation of monoclonal anti-HPV16 E7 antibody linear peptide epitope by computational biology. Peptides. 2001;22:1981–1985. doi: 10.1016/S0196-9781(01)00539-3. - DOI - PubMed
    1. Mittelman A, Tiwari R, Lucchese G, Willers J, Dummer R, Kanduc D. Identification of monoclonal anti-HMW-MAA antibody linear peptide epitope by proteomic database mining. J Invest Dermatol. 2004;123:670–675. doi: 10.1111/j.0022-202X.2004.23417.x. - DOI - PubMed
    1. Mittelman A, Lucchese A, Sinha AA, Kanduc D. Monoclonal and polyclonal humoral immune response to EC HER-2/NEU peptides with low similarity to the host's proteome. Int J Cancer. 2002;98:741–747. doi: 10.1002/ijc.10259. - DOI - PubMed

Publication types

LinkOut - more resources