Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2006 Jun 13:7:297.
doi: 10.1186/1471-2105-7-297.

Impact of RNA structure on the prediction of donor and acceptor splice sites

Affiliations

Impact of RNA structure on the prediction of donor and acceptor splice sites

Sayed-Amir Marashi et al. BMC Bioinformatics. .

Abstract

Background: gene identification in genomic DNA sequences by computational methods has become an important task in bioinformatics and computational gene prediction tools are now essential components of every genome sequencing project. Prediction of splice sites is a key step of all gene structural prediction algorithms.

Results: we sought the role of mRNA secondary structures and their information contents for five vertebrate and plant splice site datasets. We selected 900-nucleotide sequences centered at each (real or decoy) donor and acceptor sites, and predicted their corresponding RNA structures by Vienna software. Then, based on whether the nucleotide is in a stem or not, the conventional four-letter nucleotide alphabet was translated into an eight-letter alphabet. Zero-, first- and second-order Markov models were selected as the signal detection methods. It is shown that applying the eight-letter alphabet compared to the four-letter alphabet considerably increases the accuracy of both donor and acceptor site predictions in case of higher order Markov models.

Conclusion: Our results imply that RNA structure contains important data and future gene prediction programs can take advantage of such information.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Average number of structural changes in a 21-nucleotide window around 1000 donor GUs. See the text for details.
Figure 2
Figure 2
Distribution of predicted linear distances of base-paired nucleotides in RNA sequences. See text for details.
Figure 3
Figure 3
Log likelihood ratio (LLR with log-base-2) of formation of loop structure at different positions around splice sites in AtGS and HsGS datasets. The sequences are shown in 5'→3' direction. Asterisked positions are those positions that show a significant difference (p < 0.05 based on the test for differences of two binomial proportions) between the frequency of "loops" in real and decoy sites. 3' AtGS (A), 3' HsGS (B), 5' AtGS (C) and 5' HsGS (D).

References

    1. Mathé C, Sagot MF, Schiex T, Rouzé P. Current methods of gene prediction, their strengths and weaknesses. Nucleic Acids Res. 2002;30:4103–4117. doi: 10.1093/nar/gkf543. - DOI - PMC - PubMed
    1. Brent MR, Guigó R. Recent advances in gene structure prediction. Curr Opin Struct Biol. 2004;14:264–272. doi: 10.1016/j.sbi.2004.05.007. - DOI - PubMed
    1. Staley JP, Guthrie C. Mechanical devices in the spliceosome: Clocks, motors, springs and things. Cell. 1998;92:315–326. doi: 10.1016/S0092-8674(00)80925-3. - DOI - PubMed
    1. Buratti E, Baralle FE. Influence of RNA secondary structure on the pre-mRNA splicing process. . Mol Cell Biol. 2004;24:10505–10514. doi: 10.1128/MCB.24.24.10505-10514.2004. - DOI - PMC - PubMed
    1. Patterson DJ, Yasuhara K, Ruzzo WL. Pre-mRNA secondary structure prediction aids splice site prediction. Pac Symp Biocomput. 2002;7:223–234. - PubMed

LinkOut - more resources