SNVMix: predicting single nucleotide variants from next-generation sequencing of tumors
- PMID: 20130035
- PMCID: PMC2832826
- DOI: 10.1093/bioinformatics/btq040
SNVMix: predicting single nucleotide variants from next-generation sequencing of tumors
Abstract
Motivation: Next-generation sequencing (NGS) has enabled whole genome and transcriptome single nucleotide variant (SNV) discovery in cancer. NGS produces millions of short sequence reads that, once aligned to a reference genome sequence, can be interpreted for the presence of SNVs. Although tools exist for SNV discovery from NGS data, none are specifically suited to work with data from tumors, where altered ploidy and tumor cellularity impact the statistical expectations of SNV discovery.
Results: We developed three implementations of a probabilistic Binomial mixture model, called SNVMix, designed to infer SNVs from NGS data from tumors to address this problem. The first models allelic counts as observations and infers SNVs and model parameters using an expectation maximization (EM) algorithm and is therefore capable of adjusting to deviation of allelic frequencies inherent in genomically unstable tumor genomes. The second models nucleotide and mapping qualities of the reads by probabilistically weighting the contribution of a read/nucleotide to the inference of a SNV based on the confidence we have in the base call and the read alignment. The third combines filtering out low-quality data in addition to probabilistic weighting of the qualities. We quantitatively evaluated these approaches on 16 ovarian cancer RNASeq datasets with matched genotyping arrays and a human breast cancer genome sequenced to >40x (haploid) coverage with ground truth data and show systematically that the SNVMix models outperform competing approaches.
Availability: Software and data are available at http://compbio.bccrc.ca
Contact: sshah@bccrc.ca SUPPLEMANTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Figures




Similar articles
-
SECEDO: SNV-based subclone detection using ultra-low coverage single-cell DNA sequencing.Bioinformatics. 2022 Sep 15;38(18):4293-4300. doi: 10.1093/bioinformatics/btac510. Bioinformatics. 2022. PMID: 35900151 Free PMC article.
-
Enhancing SNV identification in whole-genome sequencing data through the incorporation of known genetic variants into the minimap2 index.BMC Bioinformatics. 2024 Jul 13;25(1):238. doi: 10.1186/s12859-024-05862-y. BMC Bioinformatics. 2024. PMID: 39003441 Free PMC article.
-
Mutation discovery in regions of segmental cancer genome amplifications with CoNAn-SNV: a mixture model for next generation sequencing of tumors.PLoS One. 2012;7(8):e41551. doi: 10.1371/journal.pone.0041551. Epub 2012 Aug 16. PLoS One. 2012. PMID: 22916110 Free PMC article.
-
Review of alignment and SNP calling algorithms for next-generation sequencing data.J Appl Genet. 2016 Feb;57(1):71-9. doi: 10.1007/s13353-015-0292-7. Epub 2015 Jun 9. J Appl Genet. 2016. PMID: 26055432 Review.
-
A survey of sequence alignment algorithms for next-generation sequencing.Brief Bioinform. 2010 Sep;11(5):473-83. doi: 10.1093/bib/bbq015. Epub 2010 May 11. Brief Bioinform. 2010. PMID: 20460430 Free PMC article. Review.
Cited by
-
Noninvasive prenatal paternity testing by means of SNP-based targeted sequencing.Prenat Diagn. 2020 Mar;40(4):497-506. doi: 10.1002/pd.5595. Epub 2020 Feb 20. Prenat Diagn. 2020. PMID: 31674029 Free PMC article.
-
Predicting chromosome 1p/19q codeletion by RNA expression profile: a comparison of current prediction models.Aging (Albany NY). 2019 Feb 2;11(3):974-985. doi: 10.18632/aging.101795. Aging (Albany NY). 2019. PMID: 30710490 Free PMC article.
-
Comprehensive analysis of transcriptome variation uncovers known and novel driver events in T-cell acute lymphoblastic leukemia.PLoS Genet. 2013;9(12):e1003997. doi: 10.1371/journal.pgen.1003997. Epub 2013 Dec 19. PLoS Genet. 2013. PMID: 24367274 Free PMC article.
-
Microbes, metagenomes and marine mammals: enabling the next generation of scientist to enter the genomic era.BMC Genomics. 2013 Sep 4;14:600. doi: 10.1186/1471-2164-14-600. BMC Genomics. 2013. PMID: 24007365 Free PMC article.
-
Integrated RNA and DNA sequencing improves mutation detection in low purity tumors.Nucleic Acids Res. 2014 Jul;42(13):e107. doi: 10.1093/nar/gku489. Epub 2014 Jun 26. Nucleic Acids Res. 2014. PMID: 24970867 Free PMC article.
References
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Miscellaneous