Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2006 Jul 1;34(Web Server issue):W280-4.
doi: 10.1093/nar/gkl307.

GeneAlign: a coding exon prediction tool based on phylogenetical comparisons

Affiliations

GeneAlign: a coding exon prediction tool based on phylogenetical comparisons

Shu Ju Hsieh et al. Nucleic Acids Res. .

Abstract

GeneAlign is a coding exon prediction tool for predicting protein coding genes by measuring the homologies between a sequence of a genome and related sequences, which have been annotated, of other genomes. Identifying protein coding genes is one of most important tasks in newly sequenced genomes. With increasing numbers of gene annotations verified by experiments, it is feasible to identify genes in the newly sequenced genomes by comparing to annotated genes of phylogenetically close organisms. GeneAlign applies CORAL, a heuristic linear time alignment tool, to determine if regions flanked by the candidate signals (initiation codon-GT, AG-GT and AG-STOP codon) are similar to annotated coding exons. Employing the conservation of gene structures and sequence homologies between protein coding regions increases the prediction accuracy. GeneAlign was tested on Projector dataset of 491 human-mouse homologous sequence pairs. At the gene level, both the average sensitivity and the average specificity of GeneAlign are 81%, and they are larger than 96% at the exon level. The rates of missing exons and wrong exons are smaller than 1%. GeneAlign is a free tool available at http://genealign.hccvs.hc.edu.tw.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Comparisons of the correlation between sequence homology and the prediction performance of the GeneWise, Projector and GeneAlign. The gene pairs of Projector dataset were sorted into five classes by their amino acid identities (<60, 60–70, 70–80, 80–90 and 90–100%), and the performance was calculated for each class. The amino acid identities were obtained by using a standard dynamic programming algorithm to calculate the identities between two protein sequences encoded in each homologous gene pair. The measures of sensitivity (Sn) and specificity (Sp) are respectively Sn = TP/(TP + FN) and Sp = TP/(TP + FP).

References

    1. Brent M.R., Buigo R. Recent advances in gene structure prediction. Curr. Opin. Struct. Biol. 2004;14:264–272. - PubMed
    1. Burge C., Karlin S. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 1997;268:78–94. - PubMed
    1. Brendel V., Xing L., Zhu W. Gene structure prediction from consensus spliced alignment of multiple ESTs matching the same genomic locus. Bioinformatics. 2004;20:1157–1169. - PubMed
    1. Florea L., Hartzell G., Zhang Z., Rubin G.M., Miller W. A computer program for aligning a cDNA sequence with a genomic DNA sequence. Genome Res. 1998;8:967–974. - PMC - PubMed
    1. Wheelan S.J., Church D.M., Ostell J.M. Spidey: a tool for mRNA-to-genomic alignments. Genome Res. 2001;11:1952–1957. - PMC - PubMed

Publication types