Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2000 Apr;10(4):529-38.
doi: 10.1101/gr.10.4.529.

Genie--gene finding in Drosophila melanogaster

Affiliations
Comparative Study

Genie--gene finding in Drosophila melanogaster

M G Reese et al. Genome Res. 2000 Apr.

Abstract

A hidden Markov model-based gene-finding system called Genie was applied to the genomic Adh region in Drosophila melanogaster as a part of the Genome Annotation Assessment Project (GASP). Predictions from three versions of the Genie gene-finding system were submitted, one based on statistical properties of coding genes, a second included EST alignment information, and a third that integrated protein sequence homology information. All three programs were trained on the provided Drosophila training data. In addition, promoter assignments from an integrated neural network were submitted. The gene assignments overlapped >90% of the 222 annotated genes and 26 possibly novel genes were predicted, of which some might be overpredictions. The system correctly identified the exon boundaries of 70% of the exons in cDNA-confirmed genes and 77% of the exons with the addition of EST sequence alignments. The best of the three Genie submissions predicted 19 of the annotated 43 gene structures entirely correct (44%). In the promoter category, only 30% of the transcription start sites could be detected, but by integrating this program as a sensor into Genie the false-positive rate could be dropped to 1/16,786 (0.006%). The results of the experiment on the long contiguous genomic sequence revealed some problems concerning gene assembly in Genie. The results were used to improve the system. We show that Genie is a robust hidden Markov model system that allows for a generalized integration of information from different sources such as signal sensors (splice sites, start codon, etc.), content sensors (exons, introns, intergenic) and alignments of mRNA, EST, and peptide sequences. The assessment showed that Genie could effectively be used for the annotation of complete genomes from higher organisms.

PubMed Disclaimer

Figures

Figure 1
Figure 1
A GHMM including frame constraints. (B) The beginning state; (J5′) the 5′ UTR content sensor; (S) the start codon signal sensor; (EI) the initial exon content sensor; (D) the 5′ splice site sensor; (A) the 3′ splice site sensor; (E) the internal exon content sensor; (I) the intron content sensor; (EF) the final exon content sensor; (T) the start codon signal sensor; (F) the end state. (ES) The single exon gene content sensor. For multiple genes in genomic regions such as Adh, an additional arc loops from F to B and models the intergenic region including the promoter sensor.

Comment in

Similar articles

Cited by

References

    1. Altschul SF, Gish W. Local alignment statistics. Methods Enzymol. 1996;266:460–480. - PubMed
    1. Ashburner M, Misra S, Roote J, Lewis SE, Blazej R, Davis T, Doyle C, Galle R, George R, Harris N, et al. An exploration of the sequence of a 2.9-Mb region of the genome of drosophila melanogaster. The adh region. Genetics. 1999;153:179–219. - PMC - PubMed
    1. Burge C, Karlin S. Prediction of complete gene structures in human genomic DNA. J Mol Biol. 1997;268:78–94. - PubMed
    1. Fickett JW, Tung CS. Assessment of protein coding measures. Nucleic Acids Res. 1992;20:6441–6450. - PMC - PubMed
    1. Haussler, D. 1998. Computational genefinding. Trends Biochem. Sci. Suppl. Guide Bioinformatics 12–15.

Publication types

LinkOut - more resources