Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Jun 10:15:180.
doi: 10.1186/1471-2105-15-180.

PBHoney: identifying genomic variants via long-read discordance and interrupted mapping

Affiliations

PBHoney: identifying genomic variants via long-read discordance and interrupted mapping

Adam C English et al. BMC Bioinformatics. .

Abstract

Background: As resequencing projects become more prevalent across a larger number of species, accurate variant identification will further elucidate the nature of genetic diversity and become increasingly relevant in genomic studies. However, the identification of larger genomic variants via DNA sequencing is limited by both the incomplete information provided by sequencing reads and the nature of the genome itself. Long-read sequencing technologies provide high-resolution access to structural variants often inaccessible to shorter reads.

Results: We present PBHoney, software that considers both intra-read discordance and soft-clipped tails of long reads (>10,000 bp) to identify structural variants. As a proof of concept, we identify four structural variants and two genomic features in a strain of Escherichia coli with PBHoney and validate them via de novo assembly. PBHoney is available for download at http://sourceforge.net/projects/pb-jelly/.

Conclusions: Implementing two variant-identification approaches that exploit the high mappability of long reads, PBHoney is demonstrated as being effective at detecting larger structural variants using whole-genome Pacific Biosciences RS II Continuous Long Reads. Furthermore, PBHoney is able to discover two genomic features: the existence of Rac-Phage in isolate; evidence of E. coli's circular genome.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Tail Schematic. Schematic of possible tails created by reads representing a deletion (a) and an inversion (c) allele and the structural variants they represent when mapped to the reference (b). Rectangles represent double-stranded genomic sequence. Arrows above and below a rectangle represent reads mapping to the direct and complement strands, respectively. In these examples, all initial alignments align at the 5’ breakpoint of the reference. The read spanning a deletion event creates an epilog that maps to the 3’ breakpoint on the same strand as its corresponding initial alignment. The reads spanning the inversion event breakpoints create prologs that map on the opposite strands of their corresponding initial alignment. While all three piece-alignments would cluster if we considered only their location, their orientations support two separate events in the reference region.
Figure 2
Figure 2
Simulated ALU Deletion. Plot (a) depicts the raw channels for the 327 bp ALU Deletion. Raw channels include coverage (COV), mismatches (MIS), insertions (INS), and deletions (DEL). Plot (b) are the channels after smoothing, and plot (c) is the final signal after applying the slope kernel. The gray lines represent the start and end points of the deletion.

References

    1. Hastings P, Lupski JR, Rosenberg SM, Ira G. Mechanisms of change in gene copy number. Nat Rev Genet. 2009;10(8):551–564. - PMC - PubMed
    1. Klopocki E, Mundlos S. Copy-number variations, noncoding sequences, and human phenotypes. Annu Rev Genomics Hum Genet. 2011;12:53–72. - PubMed
    1. Almal SH, Padh H. Implications of gene copy-number variation in health and diseases. J Hum Genet. 2012;57(1):6–13. - PubMed
    1. Valsesia A, Beckmann JS. Macé A. The growing importance of CNVs: new insights for detection and clinical interpretation. Front Gene. 2013;4:92. - PMC - PubMed
    1. Haraksingh RR, Snyder MP. Impacts of variation in the human genome on gene regulation. J Mol Biol. 2013;425(21):3970–3977. - PubMed