HGVbase: a human sequence variation database emphasizing data quality and a broad spectrum of data sources
- PMID: 11752345
- PMCID: PMC99093
- DOI: 10.1093/nar/30.1.387
HGVbase: a human sequence variation database emphasizing data quality and a broad spectrum of data sources
Abstract
HGVbase (Human Genome Variation database; http://hgvbase.cgb.ki.se, formerly known as HGBASE) is an academic effort to provide a high quality and non-redundant database of available genomic variation data of all types, mostly comprising single nucleotide polymorphisms (SNPs). Records include neutral polymorphisms as well as disease-related mutations. Online search tools facilitate data interrogation by sequence similarity and keyword queries, and searching by genome coordinates is now being implemented. Downloads are freely available in XML, Fasta, SRS, SQL and tagged-text file formats. Each entry is presented in the context of its surrounding sequence and many records are related to neighboring human genes and affected features therein. Population allele frequencies are included wherever available. Thorough semi-automated data checking ensures internal consistency and addresses common errors in the source information. To keep pace with recent growth in the field, we have developed tools for fully automated annotation. All variants have been uniquely mapped to the draft genome sequence and are referenced to positions in EMBL/GenBank files. Data utility is enhanced by provision of genotyping assays and functional predictions. Recent data structure extensions allow the capture of haplotype and genotype information, and a new initiative (along with BiSC and HUGO-MDI) aims to create a central repository for the broad collection of clinical mutations and associated disease phenotypes of interest.
Figures
References
-
- Cotton R.G., McKusick,V. and Scriver,C.R. (1998) The HUGO Mutation Database Initiative. Science, 279, 10–11. - PubMed
-
- Etzold T., Ulyanov,A. and Argos,P. (1996) SRS: information retrieval system for molecular biology data banks. Methods Enzymol., 266, 114–128. - PubMed
-
- Pearson W.R. (2000) Flexible sequence similarity searching with the FASTA3 program package. Methods Mol. Biol., 132, 185–219. - PubMed
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials
Miscellaneous
