Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Jan 24;9(1):3.
doi: 10.1186/1751-0473-9-3.

ROVER variant caller: read-pair overlap considerate variant-calling software applied to PCR-based massively parallel sequencing datasets

Affiliations

ROVER variant caller: read-pair overlap considerate variant-calling software applied to PCR-based massively parallel sequencing datasets

Bernard J Pope et al. Source Code Biol Med. .

Abstract

Background: We recently described Hi-Plex, a highly multiplexed PCR-based target-enrichment system for massively parallel sequencing (MPS), which allows the uniform definition of library size so that subsequent paired-end sequencing can achieve complete overlap of read pairs. Variant calling from Hi-Plex-derived datasets can thus rely on the identification of variants appearing in both reads of read-pairs, permitting stringent filtering of sequencing chemistry-induced errors. These principles underly ROVER software (derived from Read Overlap PCR-MPS variant caller), which we have recently used to report the screening for genetic mutations in the breast cancer predisposition gene PALB2. Here, we describe the algorithms underlying ROVER and its usage.

Results: ROVER enables users to quickly and accurately identify genetic variants from PCR-targeted, overlapping paired-end MPS datasets. The open-source availability of the software and threshold tailorability enables broad access for a range of PCR-MPS users.

Methods: ROVER is implemented in Python and runs on all popular POSIX-like operating systems (Linux, OS X). The software accepts a tab-delimited text file listing the coordinates of the target-specific primers used for targeted enrichment based on a specified genome-build. It also accepts aligned sequence files resulting from mapping to the same genome-build. ROVER identifies the amplicon a given read-pair represents and removes the primer sequences by using the mapping co-ordinates and primer co-ordinates. It considers overlapping read-pairs with respect to primer-intervening sequence. Only when a variant is observed in both reads of a read-pair does the signal contribute to a tally of read-pairs containing or not containing the variant. A user-defined threshold informs the minimum number of, and proportion of, read-pairs a variant must be observed in for a 'call' to be made. ROVER also reports the depth of coverage across amplicons to facilitate the identification of any regions that may require further screening.

Conclusions: ROVER can facilitate rapid and accurate genetic variant calling for a broad range of PCR-MPS users.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Pseudocode illustrating the main variant calling algorithm employed by ROVER.
Figure 2
Figure 2
Illustration of the ROVER principles. The lower panel represents contiguous sections of a reference sequence (e.g. a specific human genome build). Numbered ‘F’ and ‘R’ pairs denote primer co-ordinates used in conjunction with mapped read starting co-ordinates to guide clipping (dotted boxes). Shaded boxes represent regions under analysis. Arrows represent reads of read-pairs. ‘X’s denote positions different from the reference sequence. Only when both reads of a read-pair concur can the read-pair contribute to genetic variant calling for a given position (circled). This contribution to a read-pair pile-up allows variant calling using thresholds for minimum number of read-pairs and the proportion of read-pairs containing the variant.

References

    1. Mardis ER. The impact of next-generation sequencing technology on genetics. Trends Genet. 2008;24:133–141. doi: 10.1016/j.tig.2007.12.007. - DOI - PubMed
    1. Moorthie S, Mattocks CJ, Wright CF. Review of massively parallel DNA sequencing technologies. The HUGO Journal. 2011;5:1–12. - PMC - PubMed
    1. Morgan JE, Carr IM, Sheridan E, Chu CE, Hayward B, Camm N, Lindsay HA, Mattocks CJ, Markham AF, Bonthron DT, Taylor GR. Genetic diagnosis of familial breast cancer using clonal sequencing. Hum Mutat. 2010;31:484–491. doi: 10.1002/humu.21216. - DOI - PubMed
    1. Meldrum C, Doyle MA, Tothill RW. Next-generation sequencing for cancer diagnostics: a practical perspective. Clin Biochem Rev. 2011;32:177–195. - PMC - PubMed
    1. Nguyen-Dumont T, Pope BJ, Hammet F, Southey MC, Park DJ. A high-plex PCR approach for massively parallel sequencing. Biotechniques. 2013;55:69–74. - PubMed