Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Apr 5;13(4):e0194472.
doi: 10.1371/journal.pone.0194472. eCollection 2018.

Xome-Blender: A novel cancer genome simulator

Affiliations

Xome-Blender: A novel cancer genome simulator

Roberto Semeraro et al. PLoS One. .

Abstract

The adoption of next generation sequencing based methods in cancer research allowed for the investigation of the complex genetic structure of tumor samples. In the last few years, considerable importance was given to the research of somatic variants and several computational approaches were developed for this purpose. Despite continuous improvements to these programs, the validation of their results it's a hard challenge due to multiple sources of error. To overcome this drawback different simulation approaches are used to generate synthetic samples but they are often based on the addition of artificial mutations that mimic the complexity of genomic variations. For these reasons, we developed a novel software, Xome-Blender, that generates synthetic cancer genomes with user defined features such as the number of subclones, the number of somatic variants and the presence of copy number alterations (CNAs), without the addition of any synthetic element. The singularity of our method is the "morphological approach" used to generate mutation events. To demonstrate the power of our tool we used it to address the hard challenge of evaluating the performance of nine state-of-the-art somatic variant calling methods for small and large variants (VarScan2, MuTect, Shimmer, BCFtools, Strelka, EXCAVATOR2, Control-FREEC and CopywriteR). Through these analyses we observed that by using Xome-Blender data it is possible to appraise small differences between their performance and we have designated VarScan2 and EXCAVATOR2 as best tool for this kind of applications. Xome-Blender is unix-based, licensed under the GPLv3 and freely available at https://github.com/rsemeraro/XomeBlender.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Xome-Blender work flow.
Panel A shows how InXalizer works. Firstly, it calculates the input coverage, next it checks the parameters defined by the user (number of subclones, number of somatic variants and the presence or absence of target file.) and generates the subclones according to the selected subclonal architecture. Finally, if desired, it produces a CNA file. Panel B shows the Xome-Blender work flow. Firstly, it checks the parameters compatibility, next, according to the percentages defined by the user it generates subsamples of the BAM files produced by InXalizer, finally, it adds the CNA (if CNA file is provided) and merges the BAM files in the final product.
Fig 2
Fig 2. Xome-Blender evaluation.
The box plot a reports the variance of the ratio between the subsample and the full-sample coverage for each different percentage. Panel B, displays the distribution of mean coverage value in the full-sample and in four different subsamples. Each data point is averaged across ten synthetic replicates. Panels C,D and E represent the Expected vs. Synthetic AF for SNVs, insertions and deletions respectively. The violin plots report the distribution of synthetic AF for bins of expected AF averaged across four coverage values (50×, 100×, 150× and 200×). The outer graphs show the distribution of the AF deviation (difference between synthetic and expected AF) averaged across the three pairs. R represent the Pearson correlation coefficient. Legend colors are referred to the average coverage of the analyzed data. Panel F represnts Expected vs. Synthtetic log2-ratio.
Fig 3
Fig 3. Methods performance.
Panels A-R represents Precision and Recall as a function of coverages and contaminations. Panels A-F contains the SNVs data, G-L insertions data and M-R deletions data. Panels A-M, C-O and E-Q represent the precision as a function of coverages, normal contamination and tumor contamination respectively. Panels B-N, D-P and F-R represent the recall. The barplots S, T and U represent the percentage of shared or unshared variants detected by each calling method. The data are averaged across four coverage values (50×, 100×, 150× and 200×). Panels A1-L1 represent Precision and Recall for intersection and union of methods. The six boxes in the top of the Figure represent the precision. The boxes below represent the recall. Panels A1-G1/D1-J1, B1-H1/E1-K1 and C1-I1/F1-L1 contain SNVs, insertions and deletions data respectively for intersection/union.
Fig 4
Fig 4. Harmonic mean of precision and recall (F-measure) as a function of coverage, contamination and CNA size.
Panels A-C contains the insertions data and B-D deletions data. The circular barplots in panels A-B represent the F-score for detecting CNA of different size (1Mb, 5Mb and 10Mb) at different coverage values. The background color represent the method.

References

    1. Wang Q, Jia P, Li F, Chen H, Ji H, Hucks D, et al. Detecting somatic point mutations in cancer genome sequencing data: a comparison of mutation callers. Genome Med. 2013;5(10):91 doi: 10.1186/gm495 - DOI - PMC - PubMed
    1. Roberts ND, Kortschak RD, Parker WT, Schreiber AW, Branford S, Scott HS, et al. A comparative analysis of algorithms for somatic SNV detection in cancer. Bioinformatics. 2013;29(18):2223–30. doi: 10.1093/bioinformatics/btt375 - DOI - PMC - PubMed
    1. Consortium IHGS. Finishing the euchromatic sequence of the human genome. Nature. 2004;431(7011):931–45. doi: 10.1038/nature03001 - DOI - PubMed
    1. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25(16):2078–9. doi: 10.1093/bioinformatics/btp352 - DOI - PMC - PubMed
    1. McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20(9):1297–303. doi: 10.1101/gr.107524.110 - DOI - PMC - PubMed

Publication types