Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Sep 17:13:237.
doi: 10.1186/1471-2105-13-237.

A pipeline for automated annotation of yeast genome sequences by a conserved-synteny approach

Affiliations

A pipeline for automated annotation of yeast genome sequences by a conserved-synteny approach

Estelle Proux-Wéra et al. BMC Bioinformatics. .

Abstract

Background: Yeasts are a model system for exploring eukaryotic genome evolution. Next-generation sequencing technologies are poised to vastly increase the number of yeast genome sequences, both from resequencing projects (population studies) and from de novo sequencing projects (new species). However, the annotation of genomes presents a major bottleneck for de novo projects, because it still relies on a process that is largely manual.

Results: Here we present the Yeast Genome Annotation Pipeline (YGAP), an automated system designed specifically for new yeast genome sequences lacking transcriptome data. YGAP does automatic de novo annotation, exploiting homology and synteny information from other yeast species stored in the Yeast Gene Order Browser (YGOB) database. The basic premises underlying YGAP's approach are that data from other species already tells us what genes we should expect to find in any particular genomic region and that we should also expect that orthologous genes are likely to have similar intron/exon structures. Additionally, it is able to detect probable frameshift sequencing errors and can propose corrections for them. YGAP searches intelligently for introns, and detects tRNA genes and Ty-like elements.

Conclusions: In tests on Saccharomyces cerevisiae and on the genomes of Naumovozyma castellii and Tetrapisispora blattae newly sequenced with Roche-454 technology, YGAP outperformed another popular annotation program (AUGUSTUS). For S. cerevisiae and N. castellii, 91-93% of YGAP's predicted gene structures were identical to those in previous manually curated gene sets. YGAP has been implemented as a webserver with a user-friendly interface at http://wolfe.gen.tcd.ie/annotation.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Classifying pairs of consecutive HSPs. In this hypothetical example, a TBLASTN search using a 100 amino acid query protein produces two HSPs that are close together in the genome. We classify the relationship between two consecutive HSPs into one of four categories: (i) Frameshift. Two consecutive HSPs are in different frames but the distance between them is similar in both the query and the subject. (ii) Region of low similarity. Two consecutive HSPs are in the same frame, separated by a similar distance in both the query and the subject, with no stop codon between them. (iii) Intron. Two consecutive HSPs for which query and subject coordinates are dissimilar. This possibility is only considered if an existing gene from the same pillar and species group contains an intron. (iv) Duplication. If all other possibilities have been excluded, two consecutive HSPs suggest a probable local gene duplication.
Figure 2
Figure 2
Method for defining start and stop codon coordinates. The thick black bar indicates the location of the original BLAST HSP, and the thick grey bar indicates the gene coordinates reported by YGAP. M and asterisk (*) represent the locations of all possible start (ATG) and stop (TAA/TAG/TGA) codons in the same frame as the HSP. The start codon is chosen by searching around the beginning of the HSP as follows: (A) If the HSP (or the upstream HSP, in the case where a pair of HSPs is being considered) begins with a methionine codon, no change is made to the starting coordinate. (B) If the HSP does not begin with methionine, the ORF is extended to the furthest upstream methionine. (C) If during extension a stop codon is encountered before reaching a methionine, the software instead searches for a leading methionine within the first 45 nucleotides of the HSP. (D) If no suitable starting methionine is found using these steps, the original coordinates of the HSP are kept and the gene is tagged for manual inspection. Stop codons are found by walking downstream from the HSP, unless there is a stop codon within the HSP (in which case the HSP is trimmed accordingly).
Figure 3
Figure 3
Screenshots from the YGAP website. (A) Upload screen. (B) Results page including links to several types of output files and gene lists. (C) Mini-YGOB browser showing the new annotated species (here, T. blattae, a post-WGD species, in yellow/orange), compared to genomes of E. gossypii (non-WGD species, in green), S. cerevisiae (post-WGD species, in blue), and the Ancestral genome (in pink).

Similar articles

Cited by

References

    1. Liti G, Louis EJ. Yeast evolution and comparative genomics. Annu Rev Microbiol. 2005;59:135–153. doi: 10.1146/annurev.micro.59.030804.121400. - DOI - PubMed
    1. Wolfe KH. Comparative genomics and genome evolution in yeasts. Philos Trans R Soc Lond B Biol Sci. 2006;361:403–412. doi: 10.1098/rstb.2005.1799. - DOI - PMC - PubMed
    1. Scannell DR, Butler G, Wolfe KH. Yeast genome evolution – the origin of the species. Yeast. 2007;24:929–942. doi: 10.1002/yea.1515. - DOI - PubMed
    1. Dujon B. Yeast evolutionary genomics. Nat Rev Genet. 2010;11:512–524. - PubMed
    1. Drillon G, Fischer G. Comparative study on synteny between yeasts and vertebrates. C R Biol. 2011;334:629–638. doi: 10.1016/j.crvi.2011.05.011. - DOI - PubMed

Publication types

LinkOut - more resources