Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Jul 1;31(13):2202-4.
doi: 10.1093/bioinformatics/btv112. Epub 2015 Feb 19.

Unified representation of genetic variants

Affiliations

Unified representation of genetic variants

Adrian Tan et al. Bioinformatics. .

Abstract

A genetic variant can be represented in the Variant Call Format (VCF) in multiple different ways. Inconsistent representation of variants between variant callers and analyses will magnify discrepancies between them and complicate variant filtering and duplicate removal. We present a software tool vt normalize that normalizes representation of genetic variants in the VCF. We formally define variant normalization as the consistent representation of genetic variants in an unambiguous and concise way and derive a simple general algorithm to enforce it. We demonstrate the inconsistent representation of variants across existing sequence analysis tools and show that our tool facilitates integration of diverse variant types and call sets.

Availability and implementation: The source code is available for download at http://github.com/atks/vt. More detailed documentation is available at http://genome.sph.umich.edu/wiki/Variant_Normalization.

Contact: hmkang@umich.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Example of VCF entries representing the same variant. Left panel aligns each allele to the reference genome, and the right panel represents the variant in VCF. (A) is not left-aligned (B) is neither left-aligned nor parsimonious, (C) is not parsimonious and (D) is normalized

References

    1. 1000 Genomes Project Consortium (2012) An integrated map of genetic variation from 1,092 human genomes. Nature, 491, 56–65. - PMC - PubMed
    1. Danecek P., et al. . (2011) The variant call format and VCFtools. Bioinformatics, 27, 2156–2158. - PMC - PubMed
    1. DePristo M.A., et al. . (2011) A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet., 43, 491–498. - PMC - PubMed
    1. Mills R.E., et al. . (2011) Natural genetic variation caused by small insertions and deletions in the human genome. Genome Res., 21, 830–839. - PMC - PubMed
    1. Sherry S.T., et al. . (2001) dbSNP: the NCBI database of genetic variation. Nucleic Acids Res., 29, 308–311. - PMC - PubMed

Publication types