Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Apr 16:17:165.
doi: 10.1186/s12859-016-1014-9.

UNDR ROVER - a fast and accurate variant caller for targeted DNA sequencing

Affiliations

UNDR ROVER - a fast and accurate variant caller for targeted DNA sequencing

Daniel J Park et al. BMC Bioinformatics. .

Abstract

Background: Previously, we described ROVER, a DNA variant caller which identifies genetic variants from PCR-targeted massively parallel sequencing (MPS) datasets generated by the Hi-Plex protocol. ROVER permits stringent filtering of sequencing chemistry-induced errors by requiring reported variants to appear in both reads of overlapping pairs above certain thresholds of occurrence. ROVER was developed in tandem with Hi-Plex and has been used successfully to screen for genetic mutations in the breast cancer predisposition gene PALB2. ROVER is applied to MPS data in BAM format and, therefore, relies on sequence reads being mapped to a reference genome. In this paper, we describe an improvement to ROVER, called UNDR ROVER (Unmapped primer-Directed ROVER), which accepts MPS data in FASTQ format, avoiding the need for a computationally expensive mapping stage. It does so by taking advantage of the location-specific nature of PCR-targeted MPS data.

Results: The UNDR ROVER algorithm achieves the same stringent variant calling as its predecessor with a significant runtime performance improvement. In one indicative sequencing experiment, UNDR ROVER (in its fastest mode) required 8-fold less sequential computation time than the ROVER pipeline and 13-fold less sequential computation time than a variant calling pipeline based on the popular GATK tool. UNDR ROVER is implemented in Python and runs on all popular POSIX-like operating systems (Linux, OS X). It requires as input a tab-delimited format file containing primer sequence information, a FASTA format file containing the reference genome sequence, and paired FASTQ files containing sequence reads. Primer sequences at the 5' end of reads associate read-pairs with their targeted amplicon and, thus, their expected corresponding coordinates in the reference genome. The primer-intervening sequence of each read is compared against the reference sequence from the same location and variants are identified using the same algorithm as ROVER. Specifically, for a variant to be 'called' it must appear at the same location in both of the overlapping reads above user-defined thresholds of minimum number of reads and proportion of reads.

Conclusions: UNDR ROVER provides the same rapid and accurate genetic variant calling as its predecessor with greatly reduced computational costs.

Keywords: Hi-Plex; Massively parallel sequencing; PCR-MPS; ROVER; Targeted sequencing; Variant calling.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Hi-Plex library structure and overlapping reads. The center rectangle represents the target insert DNA sequence flanked by gene-specific primer (GSP) sites (blue) and adapter sequences (green). The two reads of a pair are shown in yellow. The 5′ end of each read starts with its corresponding gene-specific primer sequence. The insert size is chosen so that both reads overlap the target insert sequence completely. The 3′ ends of reads may extend into the adapter sequence depending on the read length and the presence/absence of insertions/deletions in the template DNA. The diagram is not to scale. Typically, the insert sequence will be significantly longer than the primer sequences
Fig. 2
Fig. 2
Pseudo code for variant calling algorithm employed by UNDR ROVER
Fig. 3
Fig. 3
Runtime comparison of GATK, ROVER and UNDR ROVER. Total sequential computing time of the GATK pipeline, ROVER and UNDR ROVER (thorough, genotyping and fast) when applied to 95 Hi-Plex samples targeting PALB2 and XRCC2 with 60 primer-pairs in the PCR. The computing time for the GATK and ROVER pipelines are decomposed into alignment with Bowtie (blue), conversion of alignment file from SAM to BAM format (yellow), indexing and sorting of BAM file (grey), and variant calling (light red for GATK and green for ROVER). Computing times for UNDR ROVER are shown for both the thorough mode (brown) and the fast mode with SNV genotyping (orange), and the fast mode without SNV genotyping (purple)

References

    1. Nguyen-Dumont T, Hammet F, Mahmoodi M, Pope BJ, Giles GG, Hopper GG, Southey MC, Park DJ. Abridged adapter primers increase the target scope of Hi-Plex. Biotechniques. 2014;58(1):33–36. - PubMed
    1. Pope BJ, Nguyen-Dumont T, Hammet F, Park DJ. ROVER variant caller: read-pair overlap considerate variant-calling software applied to PCR-based massively parallel sequencing datasets. Source Code Biol Med. 2014;9(1):3. doi: 10.1186/1751-0473-9-3. - DOI - PMC - PubMed
    1. Nguyen-Dumont T, Teo ZL, Pope BJ, Hammet F, Mahmoodi M, Tsimiklis H, Sabbaghian N, Tischkowitz M, Foulkes WD, Giles GG, et al. Hi-Plex for high-throughput mutation screening: application to the breast cancer susceptibility gene PALB2. BMC Med Genet. 2013;6(1):48. - PMC - PubMed
    1. Nguyen-Dumont T, Hammet F, Mahmoodi M, Tsimiklis H, Teo ZL, Li R, Pope BJ, Terry MB, Buys SS, Daly M, et al. Mutation screening of PALB2 in clinically ascertained families from the Breast Cancer Family Registry. Breast Cancer Res Treat. 2015;149(2):547–554. doi: 10.1007/s10549-014-3260-8. - DOI - PMC - PubMed
    1. Sequence Alignment/Map Format Specification, Version 1. [http://samtools.github.io/hts-specs/SAMv1.pdf]. Accessed 14 Apr 2016.

Publication types