Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Feb 28;18(1):133.
doi: 10.1186/s12859-017-1549-4.

BBCAnalyzer: a visual approach to facilitate variant calling

Affiliations

BBCAnalyzer: a visual approach to facilitate variant calling

Sarah Sandmann et al. BMC Bioinformatics. .

Abstract

Background: Deriving valid variant calling results from raw next-generation sequencing data is a particularly challenging task, especially with respect to clinical diagnostics and personalized medicine. However, when using classic variant calling software, the user usually obtains nothing more than a list of variants that pass the corresponding caller's internal filters. Any expected mutations (e.g. hotspot mutations), that have not been called by the software, need to be investigated manually.

Results: BBCAnalyzer (Bases By CIGAR Analyzer) provides a novel visual approach to facilitate this step of time-consuming, manual inspection of common mutation sites. BBCAnalyzer is able to visualize base counts at predefined positions or regions in any sequence alignment data that are available as BAM files. Thereby, the tool provides a straightforward solution for evaluating any list of expected mutations like hotspot mutations, or even whole regions of interest. In addition to an ordinary textual report, BBCAnalyzer reports highly customizable plots. Information on the counted number of bases, the reference bases, known mutations or polymorphisms, called mutations and base qualities is summarized in a single plot. By uniting this information in a graphical way, the user may easily decide on a variant being present or not - completely independent of any internal filters or frequency thresholds.

Conclusions: BBCAnalyzer provides a unique, novel approach to facilitate variant calling where classical tools frequently fail to call. The R package is freely available at http://bioconductor.org . The local web application is available at Additional file 2. A documentation of the R package (Additional file 1) as well as the web application (Additional file 2) with detailed descriptions, examples of all input- and output elements, exemplary code as well as exemplary data are included. A video demonstrates the exemplary usage of the local web application (Additional file 3). Additional file 3: Supplement_3. Video demonstrating the exemplary usage of the web application "BBCAnalyzer". (MP4 11571 kb).

Keywords: Hotspot mutations; Next-generation sequencing; Personalized medicine; Variant calling; Visualization.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Exemplary output file from real patient data generated by Illumina NextSeq. Relative number of reads at seven positions analyzed in case of sample “Example_Illumina”. Reference bases are plotted at the negative y axis, detected bases in the mapped reads are plotted at the positive y axis (marked 5% threshold). Likely SNV at chr1:115,258,747 (reference C, ∼70% of the reads with high-quality C and ∼30% of the reads with high-quality T). No variant at chr2:25,467,204 (reference G, ∼100% of the reads with high-quality G). Unlikely SNV at chr2:198,267,280 (reference C, ∼95% of the reads with low-quality C, ∼5% of the reads with low-quality A). Likely deletion at chr4:106,157,106 (reference A, ∼75% of the reads with high quality A, ∼25% of the reads with deleted A). Known homozygous SNP at chr17:7,579,472 (reference G, polymorphism C displayed as additional reference base, ∼100% of the reads with high-quality C). Possible insertion of a “G”, but unlikely deletion at chr20:31,022,442 (reference G, ∼97% of the reads with high-quality G, ∼3% of the reads with deleted G, ∼30% of the reads with inserted high-quality G). Likely SNV at chr21:44,514,777 (reference T, ∼65% of the reads with high-quality T, ∼35% of the reads with high-quality G)
Fig. 2
Fig. 2
Analysis of position chr20:31,022,442 with BBCAnalyzer. Relative number of reads (one bar plot per position; marked 20% threshold): UPN1 and UPN4 feature an inserted G in almost 30% of the reads, while samples UPN2, UPN3 and UPN5 feature no significant difference between the number of reads containing a deletion and the number of reads containing an insertion. Thus, only samples UPN1 and UPN4 are likely to feature the mutation chr20:31,022,441 A >AG
Fig. 3
Fig. 3
Analysis of position chr1:115,258,744 with BBCAnalyzer. Relative number of reads (one bar plot per position; marked 3% threshold): a: Simulated data. A low-frequency mutation (C >A) can be observed in case of samples SIM1, SIM2 and SIM3, but not in samples SIM4 and SIM5. b: Real data. Similar to the simulated data, the same low-frequency mutation can be observed in case of sample UPN1, but not in samples UPN2-UPN5. Thus, samples UPN1 is likely to feature the mutation chr1:115,258,744 C >A

Similar articles

References

    1. DePristo M, Banks E, Poplin R, Garimella K, Maguire J, Hartl C, Philippakis A, del Angel G, Rivas MA, Hanna M, McKenna A, Fennell T, Kernytsky A, Sivachenko A, Cibulskis K, Gabriel S, Altshuler D, Daly M. A framework for variation discovery and genotyping using next-generation dna sequencing data. Nat Genet. 2011;43:491–8. doi: 10.1038/ng.806. - DOI - PMC - PubMed
    1. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, 1000 genome project data processing subgroup The sequence alignment/map (sam) format and samtools. Bioinformatics. 2009;25:2078–079. doi: 10.1093/bioinformatics/btp352. - DOI - PMC - PubMed
    1. Pandey R, Pabinger S, Kriegner A, Weinhäusel A. MutAid: Sanger and NGS based integrated pipeline for mutation identification, validation and annotation in human molecular genetics. PLoS ONE. 2016;11:1–22. - PMC - PubMed
    1. Münz M, Ruark E, Renwick A, Ramsay E, Clarke M, Mahamdallie S, Cloke V, Seal S, Strydom A, Lunter G, Rahman N. CSN and CAVA: variant annotation tools for rapid, robust next-generation sequencing analysis in the clinical setting. Genome Med. 2015;7:1–8. doi: 10.1186/s13073-015-0195-6. - DOI - PMC - PubMed
    1. Robinson JT, Thorvaldsdóttir H, Winckler W, Guttman M, Lander ES, Getz G, Mesirov JP. Integrative genomics viewer. Nat Biotechnol. 2011;29:24–6. doi: 10.1038/nbt.1754. - DOI - PMC - PubMed