Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2003 Mar;13(3):496-502.
doi: 10.1101/gr.424203.

SLAM: cross-species gene finding and alignment with a generalized pair hidden Markov model

Affiliations
Comparative Study

SLAM: cross-species gene finding and alignment with a generalized pair hidden Markov model

Marina Alexandersson et al. Genome Res. 2003 Mar.

Abstract

Comparative-based gene recognition is driven by the principle that conserved regions between related organisms are more likely than divergent regions to be coding. We describe a probabilistic framework for gene structure and alignment that can be used to simultaneously find both the gene structure and alignment of two syntenic genomic regions. A key feature of the method is the ability to enhance gene predictions by finding the best alignment between two syntenic sequences, while at the same time finding biologically meaningful alignments that preserve the correspondence between coding exons. Our probabilistic framework is the generalized pair hidden Markov model, a hybrid of (1). generalized hidden Markov models, which have been used previously for gene finding, and (2). pair hidden Markov models, which have applications to sequence alignment. We have built a gene finding and alignment program called SLAM, which aligns and identifies complete exon/intron structures of genes in two related but unannotated sequences of DNA. SLAM is able to reliably predict gene structures for any suitably related pair of organisms, most notably with fewer false-positive predictions compared to previous methods (examples are provided for Homo sapiens/Mus musculus and Plasmodium falciparum/Plasmodium vivax comparisons). Accuracy is obtained by distinguishing conserved noncoding sequence (CNS) from conserved coding sequence. CNS annotation is a novel feature of SLAM and may be useful for the annotation of UTRs, regulatory elements, and other noncoding features.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Fourteen thousand bp from the HoxA cluster showing the HoxA2 and HoxA3 genes. The top half of the figure consists of predictions and annotations for the 5′ → 3′ strand and the bottom half for the 3′ → 5′ strand. The tracks shown are: RefSeq annotations, GENSCAN, TWINSCAN, SGP-2, and SLAM predictions, Repeats masked by RepeatMasker (A. Smit and P. Green, unpubl.), TBLASTX alignments, and SLAM and VISTA CNS annotations. The figure was created using gff2ps by J.F. Abril and R. Guigó, available at http://www1.imim.es/software/gfftools/GFF2PS.html.
Figure 2.
Figure 2.
A GPHMM for alignment and prediction of exons using genomic DNA from two different organisms. The shaded states are the typically less-conserved intergene and intron states, each producing either a single base or a gap in each organism. The use of self-transitions models their state durations as geometric. The unshaded states (all of which are exons) will all have duration 1, as they have no self-transitions; however, they are generalized and produce exon pairs according to some predetermined joint distribution. (A) In order to avoid the prediction of coding exons in all conserved regions, it was necessary to introduce conserved noncoding states (CNS). Each intron and intergene state consist of two parts: an I-state for modeling long unrelated noncoding regions, and a CNS state for modeling interspersed conserved domains. (B) The modeling of coding exon states in pairs required the construction of a specialized PHMM, consisting of match/mismatch (M), insertion (I), and deletion states (D), which was used to assign probabilities to exon pairs based on alignments in protein space using an appropriate evolutionary model.

References

    1. Altschul S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. 1990. Basic local alignment search tool. J. Mol. Biol. 215: 403-410. - PubMed
    1. Bafna V. and Huson, D.H. 2000. The conserved exon method for gene finding. ISMB-00: Proceedings of the Eight International Conference on Intelligent systems for Molecular Biology. 8: 3-12. - PubMed
    1. Batzoglou S., Pachter, L., Mesirov, J., Berger, B., and Lander, E.S. 2000. Comparative analysis of mouse and human DNA and applications to exon prediction. Genet. Res. 10: 950-958. - PMC - PubMed
    1. Bergman C.M. and Kreitman, M. 2001. Analysis of conserved noncoding DNA in Drosophila reveals similar constraints in intergenic and intronic sequences. Genet. Res. 11: 1335-1345. - PubMed
    1. Birney E. and Durbin, R. 2000. Using GeneWise in the Drosophila annotation experiment. Genet. Res. 10: 547-548. - PMC - PubMed

Publication types