Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Jul 1;27(13):i77-84.
doi: 10.1093/bioinformatics/btr205.

vipR: variant identification in pooled DNA using R

Affiliations

vipR: variant identification in pooled DNA using R

Andre Altmann et al. Bioinformatics. .

Abstract

Motivation: High-throughput-sequencing (HTS) technologies are the method of choice for screening the human genome for rare sequence variants causing susceptibility to complex diseases. Unfortunately, preparation of samples for a large number of individuals is still very cost- and labor intensive. Thus, recently, screens for rare sequence variants were carried out in samples of pooled DNA, in which equimolar amounts of DNA from multiple individuals are mixed prior to sequencing with HTS. The resulting sequence data, however, poses a bioinformatics challenge: the discrimination of sequencing errors from real sequence variants present at a low frequency in the DNA pool.

Results: Our method vipR uses data from multiple DNA pools in order to compensate for differences in sequencing error rates along the sequenced region. More precisely, instead of aiming at discriminating sequence variants from sequencing errors, vipR identifies sequence positions that exhibit significantly different minor allele frequencies in at least two DNA pools using the Skellam distribution. The performance of vipR was compared with three other models on data from a targeted resequencing study of the TMEM132D locus in 600 individuals distributed over four DNA pools. Performance of the methods was computed on SNPs that were also genotyped individually using a MALDI-TOF technique. On a set of 82 sequence variants, vipR achieved an average sensitivity of 0.80 at an average specificity of 0.92, thus outperforming the reference methods by at least 0.17 in specificity at comparable sensitivity.

Availability: The code of vipR is freely available via: http://sourceforge.net/projects/htsvipr/

Contact: altmann@mpipsykl.mpg.de.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Statistical power of the Skellam and the Poisson distribution. (a) Statistical power of both models depending on the coverage with varying allele frequency and fixed error rate of 2.7×10−3. (b) Statistical power of both models depending on the coverage with varying error rate (noise) and fixed allele frequency of formula image. (c) Statistical power on real data for the Skellam model (black solid line) using one controls and one cases pool and for the Poisson model separately on one cases (blue solid line) and one controls (orange dashed line) pool.
Fig. 2.
Fig. 2.
Scatter plot between MAFs obtained by HTS and MALDI-TOF. (a) SNPs from different validation sets are represented by different symbols, and allele frequencies in the different DNA pools are color coded. (b) Like (a) but zoomed in on allele frequencies below 0.05.
Fig. 3.
Fig. 3.
Runtime of variant calling algorithms on the TMEM132D dataset in dependence of the number of pools. Time was measured in seconds and assessed on a single Intel core at 2.67 GHz (and 6 GB memory).

References

    1. Bansal V. A statistical method for the detection of variants from next-generation resequencing of DNA pools. Bioinformatics. 2010;26:i318–i324. - PMC - PubMed
    1. Dalca A.V., Brudno M. Genome variation discovery with high-throughput sequencing data. Brief. Bioinformatics. 2010;11:3–14. - PubMed
    1. Dohm J.C., et al. Substantial biases in ultra-short read data sets from high-throughput DNA sequencing. Nucleic Acids Res. 2008;36:e105. - PMC - PubMed
    1. Druley T.E., et al. Quantification of rare allelic variants from pooled genomic DNA. Nat. Methods. 2009;6:263–265. - PMC - PubMed
    1. Durbin R.M., et al. A map of human genome variation from population-scale sequencing. Nature. 2010;467:1061–1073. - PMC - PubMed