Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2005 Jan 1;33(Database issue):D459-65.
doi: 10.1093/nar/gki135.

The Vertebrate Genome Annotation (Vega) database

Affiliations

The Vertebrate Genome Annotation (Vega) database

J L Ashurst et al. Nucleic Acids Res. .

Abstract

The Vertebrate Genome Annotation (Vega) database (http://vega.sanger.ac.uk) has been designed to be a community resource for browsing manual annotation of finished sequences from a variety of vertebrate genomes. Its core database is based on an Ensembl-style schema, extended to incorporate curation-specific metadata. In collaboration with the genome sequencing centres, Vega attempts to present consistent high-quality annotation of the published human chromosome sequences. In addition, it is also possible to view various finished regions from other vertebrates, including mouse and zebrafish. Vega displays only manually annotated gene structures built using transcriptional evidence, which can be examined in the browser. Attempts have been made to standardize the annotation procedure across each vertebrate genome, which should aid comparative analysis of orthologues across the different finished regions.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The VEGA annotation pipeline. The pipeline shown here is for human. The automated analysis for other species has slight differences. The searches are run on our computer farm and stored in an Ensembl MySQL database using the Ensembl analysis pipeline system (20). Nearly all searches and prediction algorithms are run on repeat masked sequence, the exception being CpG island prediction [see cpgreport in the EMBOSS (21) application suite]. RepeatMasker (http://ftp.genome.washington.edu/RM/RepeatMasker.html) is used to mask interspersed repeats, followed by TRF (22) to mask tandem repeats. Nucleotide sequence databases are searched with wuBLASTN (http://blast.wustl.edu), and significant hits are re-aligned to the unmasked genomic sequence using est2genome (23). The Uniprot protein database (http://www.uniprot.org) is searched with wuBLASTX, and the accession numbers of significant hits are looked up in the Pfam database (24). The hidden Markov models for Pfam protein domains are aligned against the genomic sequence using Genewise (25) to provide annotation of protein domains (Halfwise in the figure). We also run a number of ab initio prediction algorithms: genscan (26) and fgenesh (27) for genes, tRNAscan (28) to find tRNA genes and Eponine TSS (29), which predicts transcription start sites. The annotators use the Otterlace interface to create and edit genes, which are stored in the Otter database (13). Where predicted transcript structures from Ensembl are available these can be viewed from within the Otterlace interface and may be used as starting templates for gene curation. Annotation in the Otter database is submitted to the EMBL/GenBank/DDBJ nucleotide database. The database for the VEGA website is periodically created by a publishing process that involves the copying and reformatting of data from the Otter genes and automated pipeline databases.
Figure 2
Figure 2
Curated Locus Report giving information about the PAX2 locus on chromosome 10.
Figure 3
Figure 3
ContigView webpage from human chromosome 6 Vega displaying poly(A) signals/sites and SNPs associated with SLC29A1 and HSPCB loci.
Figure 4
Figure 4
Different chromosomes and regions annotated from the three different vertebrates currently available in Vega.

Similar articles

Cited by

References

    1. Dunham I., Shimizu,N., Roe,B.A., Chissoe,S., Hunt,A.R., Collins,J.E., Bruskiewich,R., Beare,D.M., Clamp,M., Smink,L.J. et al. (1999) The DNA sequence of human chromosome 22. Nature, 402, 489–495. - PubMed
    1. Birney E., Andrews,T.D., Bevan,P., Caccamo,M., Chen,Y., Clarke,L., Coates,G., Cuff,J., Curwen,V., Cutts,T. et al. (2004) An overview of Ensembl. Genome Res., 14, 925–928. - PMC - PubMed
    1. Kent W.J., Sugnet,C.W., Furey,T.S., Roskin,K.M., Pringle,T.H., Zahler,A.M. and Haussler,D. (2002) The human genome browser at UCSC. Genome Res., 12, 996–1006. - PMC - PubMed
    1. Lander E.S., Linton,L.M., Birren,B., Nusbaum,C., Zody,M.C., Baldwin,J., Devon,K., Dewar,K., Doyle,M., FitzHugh,W. et al. (2001) Initial sequencing and analysis of the human genome. Nature, 409, 860–921. - PubMed
    1. Mallon A.-M., Wilming,L., Weekes,J., Gilbert,J.G.R., Ashurst,J., Peyrefitte,S., Matthews,L., Cadman,M., McKeone,R., Sellick,C.A. et al. (2004) Organization and evolution of a gene-rich region of the mouse genome: A 12.7-Mb region deleted in the Del(13)Svea36H mouse. Genome Res., 14, 1888–1901. - PMC - PubMed

Publication types