Introns form compositional clusters in parallel with the compositional clusters of the coding sequences to which they pertain
- PMID: 21132282
- DOI: 10.1007/s00239-010-9411-6
Introns form compositional clusters in parallel with the compositional clusters of the coding sequences to which they pertain
Abstract
This report deals with the study of compositional properties of human gene sequences evaluating similarities and differences among functionally distinct sectors of the gene independently of the reading frame. To retrieve the compositional information of DNA, we present a neighbor base dependent coding system in which the alphabet of 64 letters (DNA triplets) is compressed to an alphabet of 14 letters here termed triplet composons. The triplets containing the same set of distinct bases in whatever order and number form a triplet composon. The reading of the DNA sequence is performed starting at any letter of the initial triplet and then moving, triplet-to-triplet, until the end of the sequence. The readings were made in an overlapping way along the length of the sequences. The analysis of the compositional content in terms of the composon usage frequencies of the gene sequences shows that: (i) the compositional content of the sequences is far from that of random sequences, even in the case of non-protein coding sequences; (ii) coding sequences can be classified as components of compositional clusters; and (iii) intron sequences in a cluster have the same composon usage frequencies, even as their base composition differs notably from that of their home coding sequences. A comparison of the composon usage frequencies between human and mouse homologous genes indicated that two clusters found in humans do not have their counterpart in mouse whereas the others clusters are stable in both species with respect to their composon usage frequencies in both coding and noncoding sequences.
Similar articles
-
Do Intron and Coding Sequences of Some Human-Mouse Orthologs Evolve as a Single Unit?J Mol Evol. 2016 Jun;82(6):247-50. doi: 10.1007/s00239-016-9746-8. Epub 2016 May 24. J Mol Evol. 2016. PMID: 27220874
-
A Method for the Annotation of Functional Similarities of Coding DNA Sequences: the Case of a Populated Cluster of Transmembrane Proteins.J Mol Evol. 2017 Jan;84(1):29-38. doi: 10.1007/s00239-016-9763-7. Epub 2016 Nov 3. J Mol Evol. 2017. PMID: 27812751
-
A relationship between GC content and coding-sequence length.J Mol Evol. 1996 Sep;43(3):216-23. doi: 10.1007/BF02338829. J Mol Evol. 1996. PMID: 8703087
-
The compositional properties of human genes.J Mol Evol. 1991 Jun;32(6):493-503. doi: 10.1007/BF02102651. J Mol Evol. 1991. PMID: 1908020
-
Alternative Reading Frames are an Underappreciated Source of Protein Sequence Novelty.J Mol Evol. 2023 Oct;91(5):570-580. doi: 10.1007/s00239-023-10122-3. Epub 2023 Jun 16. J Mol Evol. 2023. PMID: 37326679 Review.
Cited by
-
Conserved Critical Evolutionary Gene Structures in Orthologs.J Mol Evol. 2019 Apr;87(2-3):93-105. doi: 10.1007/s00239-019-09889-1. Epub 2019 Feb 28. J Mol Evol. 2019. PMID: 30815710
-
Evolutionary conserved compositional structures hidden in genomes of the foot-and-mouth disease virus and of the human rhinovirus.Sci Rep. 2019 Nov 12;9(1):16553. doi: 10.1038/s41598-019-53013-8. Sci Rep. 2019. PMID: 31719605 Free PMC article.
-
Do Intron and Coding Sequences of Some Human-Mouse Orthologs Evolve as a Single Unit?J Mol Evol. 2016 Jun;82(6):247-50. doi: 10.1007/s00239-016-9746-8. Epub 2016 May 24. J Mol Evol. 2016. PMID: 27220874
-
A Method for the Annotation of Functional Similarities of Coding DNA Sequences: the Case of a Populated Cluster of Transmembrane Proteins.J Mol Evol. 2017 Jan;84(1):29-38. doi: 10.1007/s00239-016-9763-7. Epub 2016 Nov 3. J Mol Evol. 2017. PMID: 27812751
References
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources