Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2005 Oct 5:6:244.
doi: 10.1186/1471-2105-6-244.

ASPIC: a novel method to predict the exon-intron structure of a gene that is optimally compatible to a set of transcript sequences

Affiliations

ASPIC: a novel method to predict the exon-intron structure of a gene that is optimally compatible to a set of transcript sequences

Paola Bonizzoni et al. BMC Bioinformatics. .

Abstract

Background: Currently available methods to predict splice sites are mainly based on the independent and progressive alignment of transcript data (mostly ESTs) to the genomic sequence. Apart from often being computationally expensive, this approach is vulnerable to several problems--hence the need to develop novel strategies.

Results: We propose a method, based on a novel multiple genome-EST alignment algorithm, for the detection of splice sites. To avoid limitations of splice sites prediction (mainly, over-predictions) due to independent single EST alignments to the genomic sequence our approach performs a multiple alignment of transcript data to the genomic sequence based on the combined analysis of all available data. We recast the problem of predicting constitutive and alternative splicing as an optimization problem, where the optimal multiple transcript alignment minimizes the number of exons and hence of splice site observations. We have implemented a splice site predictor based on this algorithm in the software tool ASPIC (Alternative Splicing PredICtion). It is distinguished from other methods based on BLAST-like tools by the incorporation of entirely new ad hoc procedures for accurate and computationally efficient transcript alignment and adopts dynamic programming for the refinement of intron boundaries. ASPIC also provides the minimal set of non-mergeable transcript isoforms compatible with the detected splicing events. The ASPIC web resource is dynamically interconnected with the Ensembl and Unigene databases and also implements an upload facility.

Conclusion: Extensive bench marking shows that ASPIC outperforms other existing methods in the detection of novel splicing isoforms and in the minimization of over-predictions. ASPIC also requires a lower computation time for processing a single gene and an EST cluster. The ASPIC web resource is available at http://aspic.algo.disco.unimib.it/aspic-devel/.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The figure illustrates two gene-factorizations into 7 and 4 pseudo-exons of the genomic sequence G. Let S1, S2 and S3 be EST sequences in S agreeing to the genomic sequence G, where sequence S1 = ABDEF, S2 = ABCDE and S3 = BDEFG, each letter in {A, B, C, D, E, F, G} denotes a sequence (A). In (B) and (C) two alternative EST-genome alignments of sequences S1, S2 and S3 are represented: each EST factorization of Si associated with the EST-genome alignment is shadowed. Pseudo-exons in the gene-factorization are colored white, while introns are in grey. Segments labelled by letters represent regions of the genomic sequence that align to a substring of the input sequence of the corresponding letter. Note that an approach that aligns independently each sequence S1, S2 and S3 to G, one after the other, may produce the gene-factorization <A, B, C, D, F, E, G> consisting of 7 pseudo-exons (B), while the one minimizing the number of pseudo-exons provides only 4 pseudo-exons (C). Indeed, there are EST factorizations of each Si that are compatible or variant compatible with the gene-factorization GE = <AB, C, DE, FG>. More precisely, <AB, DE, F> is an EST-factorization of S1 that is compatible to GE. Then <AB, C, DE> is an EST-factorization of S2 compatible to GE. Finally, <B, DE, FG> is an EST-factorization of S3 compatible with GE (C).
Figure 2
Figure 2
Location of a new EST internal factor si+1 given previous computed factors s2, ..., si. (a) Consecutive sequence components c1 ... cj are tested to find the first one that allows the identification of a genomic region that optimally aligns factor si+1 (i.e. alignment extension on one or both sides of the component): such a region is determined in (b) by the component cj. Figure (b) shows that some intervening positions (sequence x) may occur between factor si and si+1. Indeed, in this case the placement of si+1 gives the correct right end of previous factor si, since the larger factor formula image inducing canonical splice sites on the genomic sequence can be optimally aligned before si+1 thus leading to an optimal location of both si and si+1.
Figure 3
Figure 3
Example of intron detection in the human ATP1B1 (UG:Hs.291196) gene without (A) or with (B) the refinement of exon-intron boundaries. The first row shows the genomic sequence aligned to the EST sequences (below). In (A) four different introns are detected (A, B, C, D) that can be merged to only two (A, D) in B. Absolute coordinate (NCBI 35 assembly) are shown for each intron and acceptor/donor splice sites are in black-background.
Figure 4
Figure 4
Example of intron boundaries detected for the human AHCYL1 gene by AceView and ASPIC. The hypothetical novel intron predicted by AceView (July 2003 release) with non-canonical splices can be reduced to a known intron by a single A-insertion. Intron coordinates are referred to Ensembl release 26.35.1.
Figure 5
Figure 5
Snapshot of the ASPIC output for the gene HNRPR (human chromosome 1). The Table View (A) lists all detected introns, their coordinates and the number of supporting ESTs. The Alignment View (B) shows the alignment between genomic and EST sequences around splice sites. The Graphical View (C) provides a general scheme of the splicing pattern. The Transcript View (D) shows the minumum set of different transcripts compatible with the detected splicing patterns.

Similar articles

Cited by

References

    1. International Human Genome Sequencing Consortium IHGSC Initial sequencing and analysis of the human genome. Nature. 2001;409:860–921. doi: 10.1038/35057062. - DOI - PubMed
    1. Graveley B. Alternative splicing: increasing diversity in the proteomic world. Trends Genet. 2001;17:100–107. doi: 10.1016/S0168-9525(00)02176-4. - DOI - PubMed
    1. Modrek B, Lee C. A genomic view of alternative splicing. Nat Genet. 2002;30:13–19. doi: 10.1038/ng0102-13. - DOI - PubMed
    1. Nurtdinov RN, Artamonova II, Mironov AA, Gelfand MS. Low conservation of alternative splicing patterns in the human and mouse genomes. Hum Mol Genet. 2003;12:1313–1320. doi: 10.1093/hmg/ddg137. - DOI - PubMed
    1. Xu Q, Modrek B, Lee C. Genome-wide detection of tissue-specific alternative splicing in the human transcriptome. Nucleic Acids Res. 2002;30:3754–3766. doi: 10.1093/nar/gkf492. - DOI - PMC - PubMed

Publication types