Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Dec 15;35(24):5370-5371.
doi: 10.1093/bioinformatics/btz560.

VariantQC: a visual quality control report for variant evaluation

Affiliations

VariantQC: a visual quality control report for variant evaluation

Melissa Y Yan et al. Bioinformatics. .

Abstract

Summary: Large scale genomic studies produce millions of sequence variants, generating datasets far too massive for manual inspection. To ensure variant and genotype data are consistent and accurate, it is necessary to evaluate variants prior to downstream analysis using quality control (QC) reports. Variant call format (VCF) files are the standard format for representing variant data; however, generating summary statistics from these files is not always straightforward. While tools to summarize variant data exist, they generally produce simple text file tables, which still require additional processing and interpretation. VariantQC fills this gap as a user friendly, interactive visual QC report that generates and concisely summarizes statistics from VCF files. The report aggregates and summarizes variants by dataset, chromosome, sample and filter type. The VariantQC report is useful for high-level dataset summary, quality control and helps flag outliers. Furthermore, VariantQC operates on VCF files, so it can be easily integrated into many existing variant pipelines.

Availability and implementation: DISCVRSeq's VariantQC tool is freely available as a Java program, with the compiled JAR and source code available from https://github.com/BimberLab/DISCVRSeq/. Documentation and example reports are available at https://bimberlab.github.io/DISCVRSeq/.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Representative VariantQC Report. (A) HTML report for the ‘By Sample’ stratification, displaying summary statistics as interactive tables and bar graphs. Left gray panel lists four primary stratifications and their corresponding set of summary reports beneath. (B) Using the ‘SNP/Indel Summary’ table from the ‘By Sample’ stratification, the number of singleton SNVs was sorted to quickly identify the outlier m00106, which was flagged in red, along with two other potential outliers. (C) Using the ‘Variants Per Contig’ table from the ‘By Sample’ stratification, samples were sorted by the number of chrY variants, allowing easy detection of a potential QC issue for sample m00757, which was listed as female, but has a high number of chrY variants

Similar articles

Cited by

References

    1. Carson A.R. et al. (2014) Effective filtering strategies to improve data quality from population-based whole exome sequencing studies. BMC Bioinformatics, 15, 125.. - PMC - PubMed
    1. Danecek P. et al. (2011) The variant call format and VCFtools. Bioinformatics, 27, 2156–2158. - PMC - PubMed
    1. Ewels P. et al. (2016) MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics, 32, 3047–3048. - PMC - PubMed
    1. Gonzaga-Jauregui C. et al. (2012) Human genome sequencing in health and disease. Annu. Rev. Med., 63, 35–61. - PMC - PubMed
    1. Nielsen R. et al. (2011) Genotype and SNP calling from next-generation sequencing data. Nat. Rev. Genet., 12, 443–451. - PMC - PubMed

Publication types