A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data
- PMID: 21903627
- PMCID: PMC3198575
- DOI: 10.1093/bioinformatics/btr509
A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data
Abstract
Motivation: Most existing methods for DNA sequence analysis rely on accurate sequences or genotypes. However, in applications of the next-generation sequencing (NGS), accurate genotypes may not be easily obtained (e.g. multi-sample low-coverage sequencing or somatic mutation discovery). These applications press for the development of new methods for analyzing sequence data with uncertainty.
Results: We present a statistical framework for calling SNPs, discovering somatic mutations, inferring population genetical parameters and performing association tests directly based on sequencing data without explicit genotyping or linkage-based imputation. On real data, we demonstrate that our method achieves comparable accuracy to alternative methods for estimating site allele count, for inferring allele frequency spectrum and for association mapping. We also highlight the necessity of using symmetric datasets for finding somatic mutations and confirm that for discovering rare events, mismapping is frequently the leading source of errors.
Availability: http://samtools.sourceforge.net.
Contact: hengli@broadinstitute.org.
Figures



Similar articles
-
Estimation of allele frequency and association mapping using next-generation sequencing data.BMC Bioinformatics. 2011 Jun 11;12:231. doi: 10.1186/1471-2105-12-231. BMC Bioinformatics. 2011. PMID: 21663684 Free PMC article.
-
Genotype and SNP calling from next-generation sequencing data.Nat Rev Genet. 2011 Jun;12(6):443-51. doi: 10.1038/nrg2986. Nat Rev Genet. 2011. PMID: 21587300 Free PMC article. Review.
-
SNVSniffer: an integrated caller for germline and somatic single-nucleotide and indel mutations.BMC Syst Biol. 2016 Aug 1;10 Suppl 2(Suppl 2):47. doi: 10.1186/s12918-016-0300-5. BMC Syst Biol. 2016. PMID: 27489955 Free PMC article.
-
SNP calling by sequencing pooled samples.BMC Bioinformatics. 2012 Sep 20;13:239. doi: 10.1186/1471-2105-13-239. BMC Bioinformatics. 2012. PMID: 22992255 Free PMC article.
-
Recent progress and challenges in population genetics of polyploid organisms: an overview of current state-of-the-art molecular and statistical tools.Mol Ecol. 2014 Jan;23(1):40-69. doi: 10.1111/mec.12581. Epub 2013 Nov 27. Mol Ecol. 2014. PMID: 24188632 Review.
Cited by
-
Population genomics of invasive rodents on islands: Genetic consequences of colonization and prospects for localized synthetic gene drive.Evol Appl. 2021 Mar 10;14(5):1421-1435. doi: 10.1111/eva.13210. eCollection 2021 May. Evol Appl. 2021. PMID: 34025776 Free PMC article.
-
Limited haplotype diversity underlies polygenic trait architecture across 70 years of wheat breeding.Genome Biol. 2021 May 6;22(1):137. doi: 10.1186/s13059-021-02354-7. Genome Biol. 2021. PMID: 33957956 Free PMC article.
-
Inferring Very Recent Population Growth Rate from Population-Scale Sequencing Data: Using a Large-Sample Coalescent Estimator.Mol Biol Evol. 2015 Nov;32(11):2996-3011. doi: 10.1093/molbev/msv158. Epub 2015 Jul 16. Mol Biol Evol. 2015. PMID: 26187437 Free PMC article.
-
Detecting adaptive introgression in human evolution using convolutional neural networks.Elife. 2021 May 25;10:e64669. doi: 10.7554/eLife.64669. Elife. 2021. PMID: 34032215 Free PMC article.
-
scSNV: accurate dscRNA-seq SNV co-expression analysis using duplicate tag collapsing.Genome Biol. 2021 May 7;22(1):144. doi: 10.1186/s13059-021-02364-5. Genome Biol. 2021. PMID: 33962667 Free PMC article.
References
-
- Brent RP. Algorithms for Minimization without Derivatives. Englewood Cliffs, New Jersey: Prentice-Hall; 1973.
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources