Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2006 Nov;4(4):230-7.
doi: 10.1016/S1672-0229(07)60003-5.

Comparative analysis of splice site regions by information content

Affiliations
Comparative Study

Comparative analysis of splice site regions by information content

T Shashi Rekha et al. Genomics Proteomics Bioinformatics. 2006 Nov.

Abstract

We have applied concepts from information theory for a comparative analysis of donor (gt) and acceptor (ag) splice site regions in the genes of five different organisms by calculating their mutual information content (relative entropy) over a selected block of nucleotides. A similar pattern that the information content decreases as the block size increases was observed for both regions in all the organisms studied. This result suggests that the information required for splicing might be contained in the consensus of approximately 6-8 nt at both regions. We assume from our study that even though the nucleotides are showing some degrees of conservation in the flanking regions of the splice sites, certain level of variability is still tolerated, which leads the splicing process to occur normally even if the extent of base pairing is not fully satisfied. We also suggest that this variability can be compensated by recognizing different splice sites with different spliceosomal factors.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Illustrations of the construction of three different block databases for donor (A) and acceptor (B) splice sites. The splice sites are represented as donor (gt) and acceptor (ag) sites and the central dinucleotides (gt/ag) are aligned with 2, 4, or 6 nt taken on both sides. The three blocks are constructed for 6 (gt±2, ag±2), 10 (gt±4, ag±4), and 14 (gt±6, ag±6) nt, respectively. Note that the given sequences are for illustration only and are arbitrary. The exon sequences are represented as uppercase letters, and the intron sequences along with the splice site dinucleotides are given as lowercase letters. The regions enclosed within the boxes are used for the computations of the substitution matrices.
Fig. 2
Fig. 2
The mutual information content (relative entropy) calculated for donor (A; left column) and acceptor (B; right column) splice sites in the block sizes of 6 (gt±2, ag±2), 10 (gt±4, ag±4), and 14 (gt±6, ag±6) nt of the genes of five different organisms studied. The boundaries of the boxes represent the 25 (lower) and 75 (upper) percentile points. The horizontal line within the box represents the median value. The error bars show the 10 (bottom) and 90 (top) percentile points. It is clearly seen that the distribution is highly skewed and all the cases of the 90 percentile points are comparatively high in value. The median values show relatively little variation between the three blocks studied. All the graphs have been plotted on the same scale for ease in visual comparison.

Similar articles

References

    1. Lewin B. Genes VII. Oxford University Press; New York, USA: 2000. Nuclear splicing.
    1. Staden R. Computational methods to locate signals in nucleic acid sequences. Nucleic Acids Res. 1984;12:505–519. - PMC - PubMed
    1. Brunak S. Prediction of human mRNA donor and acceptor sites from the DNA sequence. J. Mol. Biol. 1991;220:49–65. - PubMed
    1. Pertea M. GeneSplicer: a new computational method for splice site prediction. Nucleic Acids Res. 2000;29:1185–1190. - PMC - PubMed
    1. Chen T. Prediction of splice sites with dependency graphs and their expanded Bayesian networks. Bioinformatics. 2005;21:471–482. - PubMed

Publication types