Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Nov 1;35(22):4782-4787.
doi: 10.1093/bioinformatics/btz492.

svtools: population-scale analysis of structural variation

Affiliations

svtools: population-scale analysis of structural variation

David E Larson et al. Bioinformatics. .

Abstract

Summary: Large-scale human genetics studies are now employing whole genome sequencing with the goal of conducting comprehensive trait mapping analyses of all forms of genome variation. However, methods for structural variation (SV) analysis have lagged far behind those for smaller scale variants, and there is an urgent need to develop more efficient tools that scale to the size of human populations. Here, we present a fast and highly scalable software toolkit (svtools) and cloud-based pipeline for assembling high quality SV maps-including deletions, duplications, mobile element insertions, inversions and other rearrangements-in many thousands of human genomes. We show that this pipeline achieves similar variant detection performance to established per-sample methods (e.g. LUMPY), while providing fast and affordable joint analysis at the scale of ≥100 000 genomes. These tools will help enable the next generation of human genetics studies.

Availability and implementation: svtools is implemented in Python and freely available (MIT) from https://github.com/hall-lab/svtools.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
The svtools pipeline. SVs are detected separately in each sample using LUMPY. Breakpoint probability distributions are utilized to merge and refine the coordinates of SV breakpoints within a cohort, followed by parallelized re-genotyping and copy number annotation. Variants are merged into a single cohort-level VCF file and variant types are classified using the combined breakpoint genotype and read-depth information

References

    1. Abel H.J. et al. (2018) Mapping and characterization of structural variation in 17, 795 deeply sequenced human genomes. bioRxiv. - PMC - PubMed
    1. Abyzov A. et al. (2011) CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. Genome Res., 21, 974–984. - PMC - PubMed
    1. Chiang C. et al. (2015) SpeedSeq: ultra-fast personal genome analysis and interpretation. Nat. Methods, 12, 966–968. - PMC - PubMed
    1. Chiang C. et al. (2017) The impact of structural variation on human gene expression. Nat. Genet., 49, 692–699. - PMC - PubMed
    1. Conrad D.F. et al. (2009) Origins and functional impact of copy number variation in the human genome. Nature, 464, 704–712. - PMC - PubMed

Publication types