Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2017 Apr 1;33(7):964-970.
doi: 10.1093/bioinformatics/btw748.

Improved VCF normalization for accurate VCF comparison

Comparative Study

Improved VCF normalization for accurate VCF comparison

Arash Bayat et al. Bioinformatics. .

Abstract

Motivation: The Variant Call Format (VCF) is widely used to store data about genetic variation. Variant calling workflows detect potential variants in large numbers of short sequence reads generated by DNA sequencing and report them in VCF format. To evaluate the accuracy of variant callers, it is critical to correctly compare their output against a reference VCF file containing a gold standard set of variants. However, comparing VCF files is a complicated task as an individual genomic variant can be represented in several different ways and is therefore not necessarily reported in a unique way by different software.

Results: We introduce a VCF normalization method called Best Alignment Normalisation (BAN) that results in more accurate VCF file comparison. BAN applies all the variations in a VCF file to the reference genome to create a sample genome, and then recalls the variants by aligning this sample genome back with the reference genome. Since the purpose of BAN is to get an accurate result at the time of VCF comparison, we define a better normalization method as the one resulting in less disagreement between the outputs of different VCF comparators.

Availability and implementation: The BAN Linux bash script along with required software are publicly available on https://sites.google.com/site/banadf16.

Contact: A.Bayat@unsw.edu.au.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Publication types