. 2010 Jun 15;26(12):i318-24.

doi: 10.1093/bioinformatics/btq214.

A statistical method for the detection of variants from next-generation resequencing of DNA pools

Vikas Bansal¹

Affiliations

PMID: 20529923
PMCID: PMC2881398
DOI: 10.1093/bioinformatics/btq214

A statistical method for the detection of variants from next-generation resequencing of DNA pools

Vikas Bansal. Bioinformatics. 2010.

. 2010 Jun 15;26(12):i318-24.

doi: 10.1093/bioinformatics/btq214.

Author

Vikas Bansal¹

Affiliation

¹ Scripps Genomic Medicine, Scripps Translational Science Institute, La Jolla, CA 92037, USA. vbansal@scripps.edu

PMID: 20529923
PMCID: PMC2881398
DOI: 10.1093/bioinformatics/btq214

Erratum in

A statistical method for the detection of variants from next-generation resequencing of DNA pools.
Bansal V, Bansal V, Libiger O. Bansal V, et al. Bioinformatics. 2016 Oct 15;32(20):3213. doi: 10.1093/bioinformatics/btw520. Epub 2016 Aug 29. Bioinformatics. 2016. PMID: 27578802 Free PMC article. No abstract available.

Abstract

Motivation: Next-generation sequencing technologies have enabled the sequencing of several human genomes in their entirety. However, the routine resequencing of complete genomes remains infeasible. The massive capacity of next-generation sequencers can be harnessed for sequencing specific genomic regions in hundreds to thousands of individuals. Sequencing-based association studies are currently limited by the low level of multiplexing offered by sequencing platforms. Pooled sequencing represents a cost-effective approach for studying rare variants in large populations. To utilize the power of DNA pooling, it is important to accurately identify sequence variants from pooled sequencing data. Detection of rare variants from pooled sequencing represents a different challenge than detection of variants from individual sequencing.

Results: We describe a novel statistical approach, CRISP [Comprehensive Read analysis for Identification of Single Nucleotide Polymorphisms (SNPs) from Pooled sequencing] that is able to identify both rare and common variants by using two approaches: (i) comparing the distribution of allele counts across multiple pools using contingency tables and (ii) evaluating the probability of observing multiple non-reference base calls due to sequencing errors alone. Information about the distribution of reads between the forward and reverse strands and the size of the pools is also incorporated within this framework to filter out false variants. Validation of CRISP on two separate pooled sequencing datasets generated using the Illumina Genome Analyzer demonstrates that it can detect 80-85% of SNPs identified using individual sequencing while achieving a low false discovery rate (3-5%). Comparison with previous methods for pooled SNP detection demonstrates the significantly lower false positive and false negative rates for CRISP.

Availability: Implementation of this method is available at http://polymorphism.scripps.edu/~vbansal/software/CRISP/.

PubMed Disclaimer

Figures

**Fig. 1.**
Illustration of how comparison of allele counts across multiple DNA pools can be used to distinguish rare variants from sequencing errors. (a) Four sequenced pools are represented as boxes with each base call shown as a circle. All five of the alternate base calls are present in a single pool. The P-value of the contingency table corresponding to four pools is 0.002 suggesting that the five base calls represent a rare SNP rather than sequencing errors. (b) Five of the nine alternate base calls are present in a single pool. The P-value of the corresponding contingency table is 0.24 indicating that the presence of five alternate base calls in a single pool is likely due to sequencing errors alone.

**Fig. 2.**
Description of the algorithm CRISP for detection of SNPs using sequencing data from k DNA pools.

**Fig. 3.**
Empirical distribution of the sequence coverage per haplotype (one pool) in the two-pooled sequencing datasets: (a) 50 individuals in two pools and (b) 48 individuals in six pools.

**Fig. 4.**
(a) Comparison of SNPs identified from the second pooled sequencing dataset using two independent statistics: contingency table P-value and quality values-based P-value. Only SNPs that were also identified from the individual sequencing of the 48 samples are shown. (b) Precision–recall curve for SNPs identified by CRISP from the second pooled dataset using different thresholds for the two P-values: contingency table P-value and the quality values-based P-value. The P-value thresholds (log base 10) are shown for each point on the curve.

See this image and copyright information in PMC

References

1. Bansal V, et al. Accurate detection and genotyping of SNPs utilizing population sequencing data. Genome Res. 2010;10:537–545. - PMC - PubMed
1. Bentley DR, et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature. 2008;456:53–59. - PMC - PubMed
1. Chernoff H. A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. Ann. Math. Stat. 1952;23:493–507.
1. Dohm JC, et al. Substantial biases in ultra-short read data sets from high-throughput DNA sequencing. Nucleic Acids Res. 2008;36:e105. - PMC - PubMed
1. Druley TE, et al. Quantification of rare allelic variants from pooled genomic DNA. Nat. Methods. 2009;6:263–265. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

U54RR02504-01/RR/NCRR NIH HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A statistical method for the detection of variants from next-generation resequencing of DNA pools

Affiliation

A statistical method for the detection of variants from next-generation resequencing of DNA pools

Author

Affiliation

Erratum in

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources