Optimal spliced alignment of homologous cDNA to a genomic DNA template
- PMID: 10869013
- DOI: 10.1093/bioinformatics/16.3.203
Optimal spliced alignment of homologous cDNA to a genomic DNA template
Abstract
Motivation: Supplementary cDNA or EST evidence is often decisive for discriminating between alternative gene predictions derived from computational sequence inspection by any of a number of requisite programs. Without additional experimental effort, this approach must rely on the occurrence of cognate ESTs for the gene under consideration in available, generally incomplete, EST collections for the given species. In some cases, particular exon assignments can be supported by sequence matching even if the cDNA or EST is produced from non-cognate genomic DNA, including different loci of a gene family or homologous loci from different species. However, marginally significant sequence matching alone can also be misleading. We sought to develop an algorithm that would simultaneously score for predicted intrinsic splice site strength and sequence matching between the genomic DNA template and a related cDNA or EST. In this case, weakly predicted splice sites may be chosen for the optimal scoring spliced alignment on the basis of surrounding sequence matching. Strongly predicted splice sites will enter the optimal spliced alignment even without strong sequence matching.
Results: We designed a novel algorithm that produces the optimal spliced alignment of a genomic DNA with a cDNA or EST based on scoring for both sequence matching and intrinsic splice site strength. By example, we demonstrate that this combined approach appears to improve gene prediction accuracy compared with current methods that rely only on either search by content and signal or on sequence similarity.
Availability: The algorithm is available as a C subroutine and is implemented in the SplicePredictor and GeneSeqer programs. The source code is available via anonymous ftp from ftp. zmdb.iastate.edu. Both programs are also implemented as a Web service at http://gremlin1.zool.iastate.edu/cgi-bin/s p.cgiand http://gremlin1.zool.iastate.edu/cgi-bin/g s.cgi, respectively.
Contact: vbrendel@iastate.edu
Similar articles
-
Computational modeling of gene structure in Arabidopsis thaliana.Plant Mol Biol. 2002 Jan;48(1-2):49-58. Plant Mol Biol. 2002. PMID: 11860212 Review.
-
Gene structure prediction from consensus spliced alignment of multiple ESTs matching the same genomic locus.Bioinformatics. 2004 May 1;20(7):1157-69. doi: 10.1093/bioinformatics/bth058. Epub 2004 Feb 5. Bioinformatics. 2004. PMID: 14764557
-
Incorporation of splice site probability models for non-canonical introns improves gene structure prediction in plants.Bioinformatics. 2005 Nov 1;21 Suppl 3:iii20-30. doi: 10.1093/bioinformatics/bti1205. Bioinformatics. 2005. PMID: 16306388
-
[Analysis, identification and correction of some errors of model refseqs appeared in NCBI Human Gene Database by in silico cloning and experimental verification of novel human genes].Yi Chuan Xue Bao. 2004 May;31(5):431-43. Yi Chuan Xue Bao. 2004. PMID: 15478601 Chinese.
-
Progress in maize gene discovery: a project update.Funct Integr Genomics. 2003 Mar;3(1-2):25-32. doi: 10.1007/s10142-002-0078-y. Epub 2002 Oct 1. Funct Integr Genomics. 2003. PMID: 12590340 Review.
Cited by
-
Genomic Analysis of Storage Protein Deficiency in Genetically Related Lines of Common Bean (Phaseolus vulgaris).Front Plant Sci. 2016 Mar 31;7:389. doi: 10.3389/fpls.2016.00389. eCollection 2016. Front Plant Sci. 2016. PMID: 27066039 Free PMC article.
-
Computational modeling of gene structure in Arabidopsis thaliana.Plant Mol Biol. 2002 Jan;48(1-2):49-58. Plant Mol Biol. 2002. PMID: 11860212 Review.
-
xGDBvm: A Web GUI-Driven Workflow for Annotating Eukaryotic Genomes in the Cloud.Plant Cell. 2016 Apr;28(4):840-54. doi: 10.1105/tpc.15.00933. Epub 2016 Mar 28. Plant Cell. 2016. PMID: 27020957 Free PMC article.
-
A next-generation sequencing method for genotyping-by-sequencing of highly heterozygous autotetraploid potato.PLoS One. 2013 May 8;8(5):e62355. doi: 10.1371/journal.pone.0062355. Print 2013. PLoS One. 2013. PMID: 23667470 Free PMC article.
-
e2g: an interactive web-based server for efficiently mapping large EST and cDNA sets to genomic sequences.Nucleic Acids Res. 2004 Jul 1;32(Web Server issue):W301-4. doi: 10.1093/nar/gkh478. Nucleic Acids Res. 2004. PMID: 15215398 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials