Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Sep 27:6:33735.
doi: 10.1038/srep33735.

Next Generation Sequencing of Pooled Samples: Guideline for Variants' Filtering

Affiliations

Next Generation Sequencing of Pooled Samples: Guideline for Variants' Filtering

Santosh Anand et al. Sci Rep. .

Erratum in

Abstract

Sequencing large number of individuals, which is often needed for population genetics studies, is still economically challenging despite falling costs of Next Generation Sequencing (NGS). Pool-seq is an alternative cost- and time-effective option in which DNA from several individuals is pooled for sequencing. However, pooling of DNA creates new problems and challenges for accurate variant call and allele frequency (AF) estimation. In particular, sequencing errors confound with the alleles present at low frequency in the pools possibly giving rise to false positive variants. We sequenced 996 individuals in 83 pools (12 individuals/pool) in a targeted re-sequencing experiment. We show that Pool-seq AFs are robust and reliable by comparing them with public variant databases and in-house SNP-genotyping data of individual subjects of pools. Furthermore, we propose a simple filtering guideline for the removal of spurious variants based on the Kolmogorov-Smirnov statistical test. We experimentally validated our filters by comparing Pool-seq to individual sequencing data showing that the filters remove most of the false variants while retaining majority of true variants. The proposed guideline is fairly generic in nature and could be easily applied in other Pool-seq experiments.

PubMed Disclaimer

Figures

Figure 1
Figure 1
(a) Allele Frequency distribution of all variants. (b) Distribution of variants according to the number of pools in which they are found.
Figure 2
Figure 2. Comparison of poolAF with AF of 1000genomes.
(a) Histogram of differences between poolAF and 1000genomes European population [1 kg(EUR)]. Minimum: −0.494;1st Quartile: 0.005; Median: 0.000; Mean: −0.002; 3rd Quartile: 0.005; Maximum: 0.308. (b) Boxplot of differences: Left panel 1000genomes_ALL (delta.kg.all) and Right panel 1000genomes_EUR (delta.kg.eur). The overall similarity between poolAF and 1000Genomes is higher for 1000genomes_EUR population as shown by smaller IQR and lesser spread of data.
Figure 3
Figure 3. Pool sequencing AF vs. AF obtained from individual genotyping by ImmunoChip SNP-array.
(a) Correlation scatterplot. The points are colour coded according to the absolute difference (delta) between the two frequencies; the number of points for corresponding ranges of delta is shown in top left inset. (b) Pool-by-pool correlation. A representative scatter plot for one of the pools (12 individuals) for 1535 SNVs is shown.
Figure 4
Figure 4. QUAL(ity) score distribution of all variants.
The dashed red vertical line denotes the ad-hoc threshold of low-quality (QUAL = 100).
Figure 5
Figure 5. Density distributions of QUAL(ity) scores of variants found in public databases (in.db), and those not found in any database (novel).
(a) Distributions for all variants (QUAL > 0) (b) Distribution for variants having QUAL > 100.

References

    1. Tennessen J. A. et al.. Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science 337, 64–69 (2012). - PMC - PubMed
    1. Nelson M. R. et al.. An abundance of rare functional variants in 202 drug target genes sequenced in 14,002 people. Science 337, 100–104 (2012). - PMC - PubMed
    1. Park J. H. et al.. Estimation of effect size distribution from genome-wide association studies and implications for future discoveries. Nat Genet 42, 570–575 (2010). - PMC - PubMed
    1. Gorlov I. P., Gorlova O. Y., Frazier M. L., Spitz M. R. & Amos C. I. Evolutionary evidence of the effect of rare variants on disease etiology. Clin Genet 79, 199–206 (2011). - PMC - PubMed
    1. Manolio T. A. et al.. Finding the missing heritability of complex diseases. Nature 461, 747–753 (2009). - PMC - PubMed