Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 May 15;34(10):1774-1777.
doi: 10.1093/bioinformatics/btx813.

SV2: accurate structural variation genotyping and de novo mutation detection from whole genomes

Affiliations

SV2: accurate structural variation genotyping and de novo mutation detection from whole genomes

Danny Antaki et al. Bioinformatics. .

Abstract

Motivation: Structural variation (SV) detection from short-read whole genome sequencing is error prone, presenting significant challenges for population or family-based studies of disease.

Results: Here, we describe SV2, a machine-learning algorithm for genotyping deletions and duplications from paired-end sequencing data. SV2 can rapidly integrate variant calls from multiple structural variant discovery algorithms into a unified call set with high genotyping accuracy and capability to detect de novo mutations.

Availability and implementation: SV2 is freely available on GitHub (https://github.com/dantaki/SV2).

Contact: jsebat@ucsd.edu.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig.1.
Fig.1.
SV2 workflow. SV2 requires a VCF file of SNVs, a BAM file, and a set of SVs to genotype as input. Before genotyping, preprocessing is performed where the median coverage, insert size, and read length is recorded for feature normalization. Features for genotyping, which include depth of coverage, discordant paired-ends, split-reads, and HAR, are measured for each SV. SVs are then genotyped with an ensemble of support vector machine classifiers. SV2 produces two output files, a BED file and a VCF, containing annotations for RefSeq genic elements, RepeatMasker repeats, segmental duplications, short tandem repeats, and common SVs from the 1000 Genomes phase 3 call set
Fig. 2.
Fig. 2.
SV2 genotyping performance. (A) False discovery rate across SV2 genotype likelihoods estimated from Illumina 2.5 M arrays (N = 57) and PacBio long reads (N = 9). Black dotted line indicates 5% FDR. (B) Group-wise transmission disequilibrium tests across SV2 genotype likelihoods in 630 offspring with shaded regions representing one standard deviation. (C) ROC curves of WGS genotyping calculated from Illumina 2.5 M arrays for SV2, SVTyper, and Manta in 57 individuals. (D) ROC curves of WGS genotyping calculated from supporting PacBio long-reads for SV2, SVTyper and Manta for SVs in nine individuals

References

    1. Abyzov A. et al. (2012) Somatic copy number mosaicism in human skin revealed by induced pluripotent stem cells. Nature, 492, 438–442. - PMC - PubMed
    1. Brandler W.M. et al. (2016) Frequency and complexity of de novo structural mutation in autism. Am J Hum Genet, 98, 667–679. - PMC - PubMed
    1. Brandler W.M. et al. (2017) Paternally inherited noncoding structural variants contribute to autism. bioRxiv, 102327.
    1. Chen R. et al. (2015) A haplotype-based framework for group-wise transmission/disequilibrium tests for rare variant association analysis. Bioinformatics, 31, 1452–1459. - PMC - PubMed
    1. Chen X. et al. (2015) Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications. Bioinformatics, 32, 1220–1222. - PubMed

Publication types