Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Oct 3;2(3):118-141.
doi: 10.1080/21501203.2011.606851.

Approaches to Fungal Genome Annotation

Affiliations

Approaches to Fungal Genome Annotation

Brian J Haas et al. Mycology. .

Abstract

Fungal genome annotation is the starting point for analysis of genome content. This generally involves the application of diverse methods to identify features on a genome assembly such as protein-coding and non-coding genes, repeats and transposable elements, and pseudogenes. Here we describe tools and methods leveraged for eukaryotic genome annotation with a focus on the annotation of fungal nuclear and mitochondrial genomes. We highlight the application of the latest technologies and tools to improve the quality of predicted gene sets. The Broad Institute eukaryotic genome annotation pipeline is described as one example of how such methods and tools are integrated into a sequencing center's production genome annotation environment.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Strand-specific RNA-Seq reads aligned to the Schizosaccharomyces japonicus genome as viewed in the Broad’s Integrative Genomics Viewer
Strand-specific RNA-Seq reads are shown aligning to the top strand (top) and bottom strand (center) separately. The left and right RNA-Seq paired fragment reads are colored red and light blue, respectively. The three reference gene structure annotations for this 10kb region of the genome is shown at bottom colored dark blue.
Figure 2
Figure 2. Hybrid approach to RNA-Seq-based transcript reconstruction leveraging genome alignment and de novo assembly
RNA-Seq reads are first aligned to the genome, then partitioned into disjoint regions of alignment coverage. Inchworm is leveraged to de novo assemble the read sequences into transcripts. The resulting transcripts are aligned to the genome using a conventional cDNA alignment tool, and PASA is leveraged to further assemble overlapping alignments and extract gene structure annotations.
Figure 3
Figure 3. Spliced nucleotide and protein alignments infer intron structures
A section of AAT Alignments of homologous protein and EST sequences to the Neurospora crassa gene (shown as query) for alkaline phosphatase (NCU01376). This region of the alignment unambiguously identifies an intron within the gene structure; consensus splice sites are shown in bold.
Figure 4
Figure 4. ARGO genome annotation editor display
Shown is the evidence for the gene structure annotation of Neurospora crassa alkaline phosphatase (NCU01376) in the ARGO genome annotation editor. Evidence consists of, from top to bottom, Augustus, GeneId, FgeneSH, SNAP, GLIMMERHMM, and GENEMARK.hmm ab initio predictions, followed by GENEWISE predictions based on top matching homologous proteins, PASA assemblies of EST alignments (ESTs not shown), EVidenceModeler consensus prediction, and the final annotated gene model for this locus. The intron boundaries that agree with the annotated gene model are highighlighted as pink vertical bars. Positions of tart and stop codons are shown as green and red vertical bars, respectively. The ab initio predictors AUGUSTUS, FgeneSH, SNAP, and GENEMARK.hmm all perfectly agree on the structure of this gene, whereas GeneId and GLIMMERHMM propose different structures. The PASA assemblies of high quality EST alignments provide evidence for UTR annotations at both gene termini, extending upstream and downstream of the start and stop codons of the annotated gene model (pink model highlighted at bottom).
Figure 5
Figure 5. The Sybil Comparative Genomics Interface
A short region of synteny among orthologous genes of Aspergillus and related genomes is shown within the Sybil interface. Similarities and differences among the annotated gene structures become readily apparent, and many differences are found to represent artifacts rather than true evolutionary differences among related genes. Examples of the most striking discrepancies among annotated gene structures, involving different numbers of exons, or intron and exon lengths are highlighted by red rectangles.
Figure 6
Figure 6. The Broad Institute Eukaryotic Genome Annotation Pipeline
Genome sequences are annotated by leveraging multiple sources of evidence for genes, including ab initio gene predictions, protein and transcript alignments, all of which are distilled into a consensus gene set. Gene products are named based on homology to proteins or domains of known function, manually refined, and ultimately released to public databases.

References

    1. Goffeau A, Barrell BG, Bussey H, Davis RW, Dujon B, Feldmann H, Galibert F, Hoheisel JD, Jacq C, Johnston M, et al. Life with 6000 genes. Science. 1996;274(5287):546, 563–547. Available from http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dop.... - PubMed
    1. Cuomo CA, Birren BW. The fungal genome initiative and lessons learned from genome sequencing. Methods in enzymology. 2010;470:833–855. doi: 10.1016/S0076-6879(10)70034-3. Available from http://www.ncbi.nlm.nih.gov/pubmed/20946837. - DOI - PubMed
    1. Haas BJ, Volfovsky N, Town CD, Troukhan M, Alexandrov N, Feldmann KA, Flavell RB, White O, Salzberg SL. Full-length messenger RNA sequences greatly improve genome annotation. Genome Biol. 2002;3(6):RESEARCH0029. Available from http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dop.... - PMC - PubMed
    1. Haas BJ, Delcher AL, Mount SM, Wortman JR, Smith RK, Jr, Hannick LI, Maiti R, Ronning CM, Rusch DB, Town CD, et al. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res. 2003;31(19):5654–5666. Available from http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dop.... - PMC - PubMed
    1. Eyras E, Caccamo M, Curwen V, Clamp M. ESTGenes: alternative splicing from ESTs in Ensembl. Genome Res. 2004;14(5):976–987. Available from http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dop.... - PMC - PubMed

LinkOut - more resources