. 2005 Feb 10:6:25.

doi: 10.1186/1471-2105-6-25.

Integrating alternative splicing detection into gene prediction

Sylvain Foissac¹, Thomas Schiex

Affiliations

PMID: 15705189
PMCID: PMC550657
DOI: 10.1186/1471-2105-6-25

Integrating alternative splicing detection into gene prediction

Sylvain Foissac et al. BMC Bioinformatics. 2005.

. 2005 Feb 10:6:25.

doi: 10.1186/1471-2105-6-25.

Authors

Sylvain Foissac¹, Thomas Schiex

Affiliation

¹ Unité de Biométrie et Intelligence Artificielle, INRA, 31326 Castanet Tolosan, France. foissac@toulouse.inra.fr

PMID: 15705189
PMCID: PMC550657
DOI: 10.1186/1471-2105-6-25

Abstract

Background: Alternative splicing (AS) is now considered as a major actor in transcriptome/proteome diversity and it cannot be neglected in the annotation process of a new genome. Despite considerable progresses in term of accuracy in computational gene prediction, the ability to reliably predict AS variants when there is local experimental evidence of it remains an open challenge for gene finders.

Results: We have used a new integrative approach that allows to incorporate AS detection into ab initio gene prediction. This method relies on the analysis of genomically aligned transcript sequences (ESTs and/or cDNAs), and has been implemented in the dynamic programming algorithm of the graph-based gene finder EuGENE. Given a genomic sequence and a set of aligned transcripts, this new version identifies the set of transcripts carrying evidence of alternative splicing events, and provides, in addition to the classical optimal gene prediction, alternative optimal predictions (among those which are consistent with the AS events detected). This allows for multiple annotations of a single gene in a way such that each predicted variant is supported by a transcript evidence (but not necessarily with a full-length coverage).

Conclusions: This automatic combination of experimental data analysis and ab initio gene finding offers an ideal integration of alternatively spliced gene prediction inside a single annotation pipeline.

PubMed Disclaimer

Figures

**Figure 1**
**EST/cDNA alignments on the spl7 gene region**. Thick lines represent matches an dotted lines, gaps. Above the genomic sequence, the 2 full-length cDNAs that provide the two correct reference gene structures are presented. Arrows indicate the start and stop codons. The ESTs T04465 and AI995153 present inconsistent splicing profiles and are labeled as *incompatible*.

**Figure 2**
**EuGène's directed acyclic graph for a short example sequence**. For simplicity purposes, only the forward strand is considered. The DNA sequence is shown above the graph. Horizontal tracks represent the different possible annotations: intergenic (bottom), UTR 5' and 3', exon in the 3 frames, intron in 3 phases (the phase of an intron is defined according to the splicing position in the last codon of the previous exon). On each track, 2 vertices are used to represent each nucleotide. These 2 vertices are linked horizontally by a contents and a transition edge (see the text and Figure 4 for details). Dotted arrows show occurrences of biological signals (like start/stop codons and donor/acceptor splice sites). They produce additional transition edges at the corresponding position. Since this version of EuGÈNE does not include any promoter or polyA site prediction tool, transitions from intergenic to UTR and vice-versa are allowed at every nucleotide position. All consistent gene structures can be represented by a path connecting the initial and terminal vertices and .

formula image — **Figure 2**
**EuGène's directed acyclic graph for a short example sequence**. For simplicity purposes, only the forward strand is considered. The DNA sequence is shown above the graph. Horizontal tracks represent the different possible annotations: intergenic (bottom), UTR 5' and 3', exon in the 3 frames, intron in 3 phases (the phase of an intron is defined according to the splicing position in the last codon of the previous exon). On each track, 2 vertices are used to represent each nucleotide. These 2 vertices are linked horizontally by a contents and a transition edge (see the text and Figure 4 for details). Dotted arrows show occurrences of biological signals (like start/stop codons and donor/acceptor splice sites). They produce additional transition edges at the corresponding position. Since this version of EuGÈNE does not include any promoter or polyA site prediction tool, transitions from intergenic to UTR and vice-versa are allowed at every nucleotide position. All consistent gene structures can be represented by a path connecting the initial and terminal vertices and .

**Figure 3**
**Detail of EuGène's directed acyclic graph and algorithm**. The zoomed region contains the two first nucleotides of the example sequence of Figure 3 (C at position i - 1, and A at position i), and two annotation tracks (UTR5' for j and exon in frame 2 for j + 1). The contents edges c connect the l vertices to the following r vertices of the same track. Transition edges t are either horizontal and link the r vertices to the l vertices of the same track, or transversal and link the r vertices to all possible l vertices according to the occurrence of a biological signal in the sequence. In this example, between and a vertex allows the transition from the UTR5' track at position i - 1 to the exonic track at i because the A nucleotide at position i is the first nucleotide of a potential start codon ATG. The dynamic programming algorithm used in EuGÈNE determines, for each vertex r, which vertex precedes r in the optimal path. In this example, at position i for the track j the best path leading to from the left has a weight (only one origin is possible). For the track j + 1, the best path leading to will be attributed a weight of either , whatever the lower.

**Figure 4**
**Extension of EuGène's graph by a PCS to incorporate a single alternative transcript alignment**. From the main graph (bottom) described in Figure 3, a Parallel Graph Subunit (PGS) is built (above) by duplicating the whole graph section involved in the EST alignment (between the graphs). Gene structure evidences provided by the alignment are taken into account in the PGS by forbidding the intergenic track all along the alignment, intronic tracks at match positions (light grey), and exonic tracks in gap positions (dark grey). Dotted arrows represent the two algorithm scans, the forward version from left to right, and the backward version from right to left. At the junction point in the PGS, an optimal prediction is obtained. Figure not to scale.

**Figure 5**
**Integration of several incompatible ESTs in EuGène-M's graph and algorithm**. A: EST alignments (plain lines represent exons, dotted lines, intron) on a genomic sequence (thick line). Each displayed EST is *incompatible* with at least another one. B: Multiple extensions of EuGÈNE's graph model after having processed these alignments. Each PGS (Figure 3) contains the information provided by its source EST. The dotted arrows show the algorithm progression through the resulting graph during the first scan, from the left to the right.

See this image and copyright information in PMC

Cited by

yrGATE: a web-based gene-structure annotation tool for the identification and dissemination of eukaryotic genes.
Wilkerson MD, Schlueter SD, Brendel V. Wilkerson MD, et al. Genome Biol. 2006;7(7):R58. doi: 10.1186/gb-2006-7-7-r58. Genome Biol. 2006. PMID: 16859520 Free PMC article.
Using ESTs to improve the accuracy of de novo gene prediction.
Wei C, Brent MR. Wei C, et al. BMC Bioinformatics. 2006 Jul 3;7:327. doi: 10.1186/1471-2105-7-327. BMC Bioinformatics. 2006. PMID: 16817966 Free PMC article.
mGene: accurate SVM-based gene finding with an application to nematode genomes.
Schweikert G, Zien A, Zeller G, Behr J, Dieterich C, Ong CS, Philips P, De Bona F, Hartmann L, Bohlen A, Krüger N, Sonnenburg S, Rätsch G. Schweikert G, et al. Genome Res. 2009 Nov;19(11):2133-43. doi: 10.1101/gr.090597.108. Epub 2009 Jun 29. Genome Res. 2009. PMID: 19564452 Free PMC article.
nGASP--the nematode genome annotation assessment project.
Coghlan A, Fiedler TJ, McKay SJ, Flicek P, Harris TW, Blasiar D; nGASP Consortium; Stein LD. Coghlan A, et al. BMC Bioinformatics. 2008 Dec 19;9:549. doi: 10.1186/1471-2105-9-549. BMC Bioinformatics. 2008. PMID: 19099578 Free PMC article.
Identification of alternative 5'/3' splice sites based on the mechanism of splice site competition.
Xia H, Bi J, Li Y. Xia H, et al. Nucleic Acids Res. 2006;34(21):6305-13. doi: 10.1093/nar/gkl900. Epub 2006 Nov 10. Nucleic Acids Res. 2006. PMID: 17098928 Free PMC article.

See all "Cited by" articles

References

1. Modrek B, Lee C. A genomic view of alternative splicing. Nat Genet. 2002;30:13–9. doi: 10.1038/ng0102-13. - DOI - PubMed
1. International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome. Nature. 2001;409:860–921. doi: 10.1038/35057062. - DOI - PubMed
1. Johnson J, Castle J, Garrett-Engele P, Kan Z, Loerch P, Armour C, Santos R, Schadt E, Stoughton R, Shoemaker D. Genome-wide survey of human alternative pre-mRNA splicing with exon junction microarrays. Science. 2003;302:2141–4. doi: 10.1126/science.1090100. - DOI - PubMed
1. Burge C, Karlin S. Prediction of complete gene structures in human genomic DNA. J Mol Biol. 1997;268:78–94. doi: 10.1006/jmbi.1997.0951. - DOI - PubMed
1. Krogh A. Using database matches with for HMMGene for automated gene detection in Drosophila. Genome Res. 2000;10:391–7. doi: 10.1101/gr.10.4.523. - DOI - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Integrating alternative splicing detection into gene prediction

Affiliation

Integrating alternative splicing detection into gene prediction

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials