Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Mar 14;18(Suppl 3):53.
doi: 10.1186/s12859-017-1464-8.

Pysim-sv: a package for simulating structural variation data with GC-biases

Affiliations

Pysim-sv: a package for simulating structural variation data with GC-biases

Yuchao Xia et al. BMC Bioinformatics. .

Abstract

Background: Structural variations (SVs) are wide-spread in human genomes and may have important implications in disease-related and evolutionary studies. High-throughput sequencing (HTS) has become a major platform for SV detection and simulation serves as a powerful and cost-effective approach for benchmarking SV detection algorithms. Accurate performance assessment by simulation requires the simulator capable of generating simulation data with all important features of real data, such GC biases in HTS data and various complexities in tumor data. However, no available package has systematically addressed all issues in data simulation for SV benchmarking.

Results: Pysim-sv is a package for simulating HTS data to evaluate performance of SV detection algorithms. Pysim-sv can introduce a wide spectrum of germline and somatic genomic variations. The package contains functionalities to simulate tumor data with aneuploidy and heterogeneous subclones, which is very useful in assessing algorithm performance in tumor studies. Furthermore, Pysim-sv can introduce GC-bias, the most important and prevalent bias in HTS data, in the simulated HTS data.

Conclusions: Pysim-sv provides an unbiased toolkit for evaluating HTS-based SV detection algorithms.

Keywords: Breakpoints; Copy number variation; Next-generation sequencing; Translocation.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
The workflow of Pysim-sv. Component 1 simulates a personal genome by introducing genomic variations to a given reference genome. Component 2 generates tumor genomes by simulating aneuploidy and somatic variations. Subclones are iteratively generated. Component 3 generates HTS reads, mixes reads from different tumor/normal genomes and introduces GC-bias
Fig. 2
Fig. 2
The GC-dependency in real data and simulated data. The GC-dependency in (a, b, c) three real sequencing data from the 1000 Genome Project and in (d, e, f) three simulated data generated by pysim-sv. The x-axis is the GC-proportion in 10 Kb bins and the y-axis is the number of mapped reads in the bins. Note that the lower bands in the left and middle panel of (a, b, c) correspond to bins in chromosome X and the two individuals here are two males. The functions in (d, e, f) are f 1, f 2 and f 3 as presented in the GC-bias introduction section
Fig. 3
Fig. 3
The sensitivity of the four SV detection algorithms with different parameters. Deletions (black), inversion (grey) and translocations (white) are compared, individually. a, b, and c are the simulated data with GC-bias, and d, e, f are the simulated data without GC-bias. The purities are 1 (a, d), 0.8 (b, e) and 0.5 (c, f)
Fig. 4
Fig. 4
True CNVs in a simulated genome and detected by BIC-seq2. a Forty CNVs were introduced in the simulated genome and thirty-five copy number gains (red lines) and copy number losses (blue lines) were detected by BIC-Seq2. b True copy number gains (red lines) and copy number losses (blue lines) in the simulated genome

References

    1. Mills RE, Walter K, Stewart C, Handsaker RE, Chen K, Alkan C, Abyzov A, Yoon SC, Ye K, Cheetham RK, et al. Mapping copy number variation by population-scale genome sequencing. Nature. 2011;470(7332):59–65. doi: 10.1038/nature09708. - DOI - PMC - PubMed
    1. Sismani C, Koufaris C, Voskarides K. Genomic Elements in Health, Disease and Evolution. New York: Springer; 2015. Copy number variation in human health, disease and evolution.
    1. Ding L, Wendl MC, Koboldt DC, et al. Analysis of next generation genomic data in cancer: accomplishments and challenges. Hum Mol Genet. 2010;19(R2):188–96. doi: 10.1093/hmg/ddq391. - DOI - PMC - PubMed
    1. Stephens PJ, Greenman CD, Fu B, Yang F, Bignell GR, Mudie LJ, Pleasance ED, Lau KW, Beare D, Stebbings LA, et al. Massive genomic rearrangement acquired in a single catastrophic event during cancer development. Cell. 2011;144(1):27–40. doi: 10.1016/j.cell.2010.11.055. - DOI - PMC - PubMed
    1. Yang L, Luquette LJ, Gehlenborg N, Xi R, Haseley PS, Hsieh CH, Zhang C, Ren X, Protopopov A, Chin L, et al. Diverse mechanisms of somatic structural variations in human cancer genomes. Cell. 2013;153(4):919–29. doi: 10.1016/j.cell.2013.04.010. - DOI - PMC - PubMed

LinkOut - more resources