Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 May 25;22(1):161.
doi: 10.1186/s13059-021-02380-5.

Samplot: a platform for structural variant visual validation and automated filtering

Affiliations

Samplot: a platform for structural variant visual validation and automated filtering

Jonathan R Belyeu et al. Genome Biol. .

Abstract

Visual validation is an important step to minimize false-positive predictions from structural variant (SV) detection. We present Samplot, a tool for creating images that display the read depth and sequence alignments necessary to adjudicate purported SVs across samples and sequencing technologies. These images can be rapidly reviewed to curate large SV call sets. Samplot is applicable to many biological problems such as SV prioritization in disease studies, analysis of inherited variation, or de novo SV review. Samplot includes a machine learning package that dramatically decreases the number of false positives without human review. Samplot is available at https://github.com/ryanlayer/samplot .

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Samplot creates multi-technology images specialized for SV call review. A putative deletion call is shown, with the call and confidence intervals at the top of the image (represented by a dark bar and smaller lines). Two sequence alignment tracks follow, containing Illumina paired-end sequencing and Pacific Biosciences (PacBio) long-read sequencing data, each alignment file plotted as a separate track in the image. PacBio data is further divided by haplotype (HP) into subplots. Reads are indicated by horizontal lines and color-coded for alignment type (concordant/discordant insert size, pair order, split alignment, or long read). The coverage for the region is shown with the gray-filled background, which is split into map quality above or below a user-defined threshold (in dark or light gray respectively). An annotation from the Tandem Repeats Finder [16] indicates where genomic repeats occur. A gene annotation track shows the position of introns (thin blue line) and exons (thick blue line) near the variant; a small blue arrow on the right denotes the direction of transcription for the gene
Fig. 2
Fig. 2
Samplot images of duplication, inversion, and translocation variants. a A duplication variant plotted by Samplot with Illumina short-read sequencing evidence. Reads plotted in red have large insert sizes and inverted pair order (reverse strand followed by forward strand instead of forward followed by reverse), indicating potential support for a duplication. b An inversion variant, with Illumina sequencing evidence. Reads plotted in blue have large insert sizes and same-direction pair alignments (both reads on forward strand, or both on reverse strand). c A translocation variant, with Illumina sequencing. Discordant pairs align to each breakpoint. The blue color of the reads and extremely large insert sizes of these grouped discordant pairs indicate a large inverted translocation
Fig. 3
Fig. 3
Samplot creates images for a quick review of SV VCF files. Samplot’s “samplot vcf” command will plot all SVs in a VCF file or filter to a subset via user-defined statements. “Samplot vcf” creates an index page and sends commands to “samplot plot,” which generates images for each variant that passes the filters. The index.html page displays a table of variant info. Clicking on a row loads a Samplot image, allowing additional filtering or variant prioritization
Fig. 4
Fig. 4
SV filtering performance of duphold (DHFFC) and Samplot-ML. a–c Short-read SV call sets generated by LUMPY/SVTYPER and MANTA were then filtered by Samplot-ML, duphold (DHFFC), Paragraph, and SV2 and were then compared to the long-read-validated truth set. d Long-read SVs were called with Sniffles, filtered by Samplot-ML, then compared to the GIAB truth set
Fig. 5
Fig. 5
Model performance in data sets that differ from the training set. a The number of true-positive and false-positive SVs from different SV calling and filtering methods considering the same sample (HG002), sequenced using two libraries with different coverages, read lengths, and insert sizes. b, c The percent increase in true-positive SVs that Samplot-ML recovers versus duphold (b) and SV2 (c) for SVs in simulated mixtures of samples (CHM13 and CHM1 cell lines) at different rates

Similar articles

Cited by

References

    1. Stefansson H, Rujescu D, Cichon S, Pietiläinen OPH, Ingason A, Steinberg S, et al. Large recurrent microdeletions associated with schizophrenia. Nature. 2008;455:232–236. doi: 10.1038/nature07229. - DOI - PMC - PubMed
    1. Ma R, Deng L, Xia Y, Wei X, Cao Y, Guo R, et al. A clear bias in parental origin of de novo pathogenic CNVs related to intellectual disability, developmental delay and multiple congenital anomalies. Sci Rep. 2017;7:44446. doi: 10.1038/srep44446. - DOI - PMC - PubMed
    1. Xu B, Roos JL, Levy S, van Rensburg EJ, Gogos JA, Karayiorgou M. Strong association of de novo copy number mutations with sporadic schizophrenia. Nat Genet. 2008;40:880–885. doi: 10.1038/ng.162. - DOI - PubMed
    1. Sudmant PH, Rausch T, Gardner EJ, Handsaker RE, Abyzov A, Huddleston J, et al. An integrated map of structural variation in 2,504 human genomes. Nature. 2015; Available from: https://www.nature.com/nature/journal/v526/n7571/pdf/nature15394.pdf. - PMC - PubMed
    1. Redon R, Ishikawa S, Fitch KR, Feuk L, Perry GH, Andrews TD, et al. Global variation in copy number in the human genome. Nature. 2006;444:444–454. doi: 10.1038/nature05329. - DOI - PMC - PubMed

Publication types

LinkOut - more resources