Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Mar;17(2):346-51.
doi: 10.1093/bib/bbv051. Epub 2015 Jul 25.

VCF-Miner: GUI-based application for mining variants and annotations stored in VCF files

VCF-Miner: GUI-based application for mining variants and annotations stored in VCF files

Steven N Hart et al. Brief Bioinform. 2016 Mar.

Abstract

Next-generation sequencing platforms are widely used to discover variants associated with disease. The processing of sequencing data involves read alignment, variant calling, variant annotation and variant filtering. The standard file format to hold variant calls is the variant call format (VCF) file. According to the format specifications, any arbitrary annotation can be added to the VCF file for downstream processing. However, most downstream analysis programs disregard annotations already present in the VCF and re-annotate variants using the annotation provided by that particular program. This precludes investigators who have collected information on variants from literature or other sources from including these annotations in the filtering and mining of variants. We have developed VCF-Miner, a graphical user interface-based stand-alone tool, to mine variants and annotation stored in the VCF. Powered by a MongoDB database engine, VCF-Miner enables the stepwise trimming of non-relevant variants. The grouping feature implemented in VCF-Miner can be used to identify somatic variants by contrasting variants in tumor and in normal samples or to identify recessive/dominant variants in family studies. It is not limited to human data, but can also be extended to include non-diploid organisms. It also supports copy number or any other variant type supported by the VCF specification. VCF-Miner can be used on a personal computer or large institutional servers and is freely available for download from http://bioinformaticstools.mayo.edu/research/vcf-miner/.

Keywords: VCF; analysis; bioinformatics; genomics; software; user interface.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Screenshot of VCF-Miner. The left panel shows a running tabulation of filters applied and the number of variants remaining. A pop-up dialog appears when the user clicks the ‘Add Filter’ button. The right panel consists of a tabular representation of the results. Users can choose which columns to show and hide, and when ready, a tab-delimited file of the selected filtered data and annotations can be exported.
Figure 2.
Figure 2.
Custom logic filtering. In this figure, we demonstrate how to construct filters across groups of samples. Group 1 consists of nine samples. One could restrict variants to those present in Group 1 using the default setup. By changing the genotype option to heterozygous, then the variants returned would have to be heterozygous in any sample. To return only variants that are heterozygous in all nine samples, the sample status would be changed to ‘In all samples’. The alternate allele depth filter allows the user to specify the minimum number of reads supporting a variant—provided the VCF contains an AD field (see text for more details).

References

    1. Depristo MA, Banks E, Poplin R, et al. A framework for variation discovery and genotyping using next-generation dna sequencing data. Nat Genet 2011;43:491–8. - PMC - PubMed
    1. Mckenna A, Hanna M, Banks E, et al. The genome analysis toolkit: a mapreduce framework for analyzing next-generation dna sequencing data. Genome Res 2010;20:1297–303. - PMC - PubMed
    1. Kocher JP, Quest DJ, Duffy P, et al. The biological reference repository (bior): a rapid and flexible system for genomics annotation. Bioinformatics 2014;30:1920–2. - PMC - PubMed
    1. Wang K, Li M, Hakonarson H. Annovar: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res 2010;38:e164. - PMC - PubMed
    1. Danecek P, Auton A, Abecasis G, et al. The variant call format and vcftools. Bioinformatics 2011;27:2156–8. - PMC - PubMed

Publication types

MeSH terms