Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2006 Apr 3:1:10.
doi: 10.1186/1745-6150-1-10.

Method of predicting splice sites based on signal interactions

Affiliations

Method of predicting splice sites based on signal interactions

Alexander Churbanov et al. Biol Direct. .

Abstract

Background: Predicting and proper ranking of canonical splice sites (SSs) is a challenging problem in bioinformatics and machine learning communities. Any progress in SSs recognition will lead to better understanding of splicing mechanism. We introduce several new approaches of combining a priori knowledge for improved SS detection. First, we design our new Bayesian SS sensor based on oligonucleotide counting. To further enhance prediction quality, we applied our new de novo motif detection tool MHMMotif to intronic ends and exons. We combine elements found with sensor information using Naive Bayesian Network, as implemented in our new tool SpliceScan.

Results: According to our tests, the Bayesian sensor outperforms the contemporary Maximum Entropy sensor for 5' SS detection. We report a number of putative Exonic (ESE) and Intronic (ISE) Splicing Enhancers found by MHMMotif tool. T-test statistics on mouse/rat intronic alignments indicates, that detected elements are on average more conserved as compared to other oligos, which supports our assumption of their functional importance. The tool has been shown to outperform the SpliceView, GeneSplicer, NNSplice, Genio and NetUTR tools for the test set of human genes. SpliceScan outperforms all contemporary ab initio gene structural prediction tools on the set of 5' UTR gene fragments.

Conclusion: Designed methods have many attractive properties, compared to existing approaches. Bayesian sensor, MHMMotif program and SpliceScan tools are freely available on our web site.

Reviewers: This article was reviewed by Manyuan Long, Arcady Mushegian and Mikhail Gelfand.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Consensus motifs for donor and acceptor SSs. Y-axis indicates the strength of base composition bias based on information content.
Figure 2
Figure 2
Models of exon definition and ESS-ESE interaction.
Figure 3
Figure 3
Bayesian sensor histograms produced for 5' SS and 3' SS signals on the test set of 250 human genes.
Figure 4
Figure 4
ROC diagrams for Donor and Acceptor signals.
Figure 5
Figure 5
Study of cross-correlation performance dependency.
Figure 6
Figure 6
Study of sensor performance based on learning set size.
Figure 7
Figure 7
ISE motifs found in vicinity of 3' SS (Figures 7(a)-7(g)) and 5' SS (Figures 7(h)-7(m)).
Figure 8
Figure 8
ESE motifs repetitively detected in our MHHMotif runs.
Figure 9
Figure 9
ROC diagrams for Donor and Acceptor applications.
Figure 10
Figure 10
Blocks placement within consensus.
Figure 11
Figure 11
LOD diagram for GGG signal, reported as an ISE (Figures 11(a), 11(b)). LOD diagram for 9G8 signals, reported as an ESE (Figures 11(c),11(d))
Figure 12
Figure 12
In our HHMM motif model B denotes background state – equiprobable emission of A, C, G, T. X is a special marker for sticky end handling to ensure proper convolution patterns. Sticky end of 10 X's is automatically added to every sample sequence by our tool.
Figure 13
Figure 13
MHMM model we use, where μ1, ..., μK are the mixing proportions of components such that formula image.
Figure 14
Figure 14
Donor and acceptor histograms approximated with a mixture of Beta distributions.
Figure 15
Figure 15
Example of ISE signal interactions.
Figure 16
Figure 16
LOD diagrams for Donor and Acceptor signal interactions.

Similar articles

Cited by

References

    1. Lim L, Burge C. A computational analysis of sequence features involved in recognition of short introns. Proceedings of the National Academy of Sciences. 2001;98:11193–11198. - PMC - PubMed
    1. Krogh A. Gene finding: putting the parts together. In: Bishop MJ, editor. Guide to Human Genome Computing. 2. Academic Press, San Diego, CA; 1998. pp. 261–274.
    1. Burge C, Karlin S. Predictions of complete gene structures in human genomic DNA. Journal of Molecular Biology. 1997;268:78–94. - PubMed
    1. Krogh A. Two methods for improving performance of an HMM and their application for gene-finding. In: Gaasterland T et al, editor. Proceedings of the Fifth International Conference on Intelligent Systems for Molecular Biology. AAAI Press, Menlo Park, CA; 1997. pp. 179–186. - PubMed
    1. Rogozin I, Milanesi L. Analysis of Donor Splice Sites in Different Eukaryotic Organisms. Journal of Molecular Evolution. 1997;45:50–59. - PubMed

LinkOut - more resources