Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2005 Sep 30:6:138.
doi: 10.1186/1471-2164-6-138.

Identification of disease causing loci using an array-based genotyping approach on pooled DNA

Affiliations

Identification of disease causing loci using an array-based genotyping approach on pooled DNA

David W Craig et al. BMC Genomics. .

Abstract

Background: Pooling genomic DNA samples within clinical classes of disease followed by genotyping on whole-genome SNP microarrays, allows for rapid and inexpensive genome-wide association studies. Key to the success of these studies is the accuracy of the allelic frequency calculations, the ability to identify false-positives arising from assay variability and the ability to better resolve association signals through analysis of neighbouring SNPs.

Results: We report the accuracy of allelic frequency measurements on pooled genomic DNA samples by comparing these measurements to the known allelic frequencies as determined by individual genotyping. We describe modifications to the calculation of k-correction factors from relative allele signal (RAS) values that remove biases and result in more accurate allelic frequency predictions. Our results show that the least accurate SNPs, those most likely to give false-positives in an association study, are identifiable by comparing their frequencies to both those from a known database of individual genotypes and those of the pooled replicates. In a disease with a previously identified genetic mutation, we demonstrate that one can identify the disease locus through the comparison of the predicted allelic frequencies in case and control pools. Furthermore, we demonstrate improved resolution of association signals using the mean of individual test-statistics for consecutive SNPs windowed across the genome. A database of k-correction factors for predicting allelic frequencies for each SNP, derived from several thousand individually genotyped samples, is provided. Lastly, a Perl script for calculating RAS values for the Affymetrix platform is provided.

Conclusion: Our results illustrate that pooling of DNA samples is an effective initial strategy to identify a genetic locus. However, it is important to eliminate inaccurate SNPs prior to analysis by comparing them to a database of individually genotyped samples as well as by comparing them to replicates of the pool. Lastly, detection of association signals can be improved by incorporating data from neighbouring SNPs.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Example of RAS statistics for three SNPs based on genotyping of 100 individuals with an average call rate of all SNPs greater than 98%. These example SNPs illustrate how SNP call reliability can vary both between SNPs and within the same SNP, as measured by RAS1 and RAS2 values. Blue spheres are BB individuals, orange triangles are AA individuals, and green squares are AB individuals, grey stars are "Not Called".
Figure 2
Figure 2
(A) Allele frequency differences between individual and pooled genotypes. Histogram representing the total number of SNPs at each allele frequency difference between individual and pooled samples. (B) Accuracy of predicted SNP frequencies increases for those SNPs that perform well on Mapping 10K individual assays and decreases for poorly performing SNPs. The mean and median absolute difference between the predicted allelic frequency and individually genotyped allelic frequencies are shown vs. the binned performance of SNPs on individual assays. Performance is ranked by the frequency of calls in a set of 3,000 individually genotyped samples.
Figure 3
Figure 3
Identification of the SIDDT locus from pooled genomic DNA by calculating the mean test-statistic for a rolling window of consecutive SNPs. The moving window was determined across the genome and the p-value was calculated from a distribution of 400 bootstraps of the original dataset. Mean window sizes of 1, 3, 5, 10, 15, and 20 are shown and the SIDDT locus is highlighted in yellow. The SIDDT disease locus is the top region for window sizes of 1, 5, 10, 15, and 20.

References

    1. Matsuzaki H, Dong S, Loi H, Di X, Liu G, Hubbell E, Law J, Berntsen T, Chadha M, Hui H, Yang G, Kennedy GC, Webster TA, Cawley S, Walsh E, Jones KW, Fodor SP, Mei R. Genotyping over 100,000 SNPs on a pair of olignucleotide arrays. Nature Methods. 2004;1:109–111. doi: 10.1038/nmeth718. - DOI - PubMed
    1. Fan JB, Oliphant A, Shen R, Kermani BG, Garcia F, Gunderson KL, Hansen M, Steemers F, Butler SL, Deloukas P, Galver L, Hunt S, McBride C, Bibikova M, Rubano T, Chen J, Wickham E, Doucet D, Chang W, Campbell D, Zhang B, Kruglyak S, Bentley D, Haas J, Rigault P, Zhou L, Stuelpnagel J, Chee MS. Highly parallel SNP genotyping. Cold Spring Harb Symp Quant Biol. 2003;68:69–78. doi: 10.1101/sqb.2003.68.69. - DOI - PubMed
    1. Marnellos G. High-throughput SNP analysis for genetic association studies. Curr Opin Drug Discov Devel. 2003;6:317–321. - PubMed
    1. Cardon LR, Bell JI. Association study designs for complex diseases. Nat Rev Genet. 2001;2:91–99. doi: 10.1038/35052543. - DOI - PubMed
    1. Risch N, Teng J. The relative power of family-based and case-control designs for linkage disequilibrium studies of complex human diseases I. DNA pooling. Genome Res. 1998;8:1273–1288. - PubMed

LinkOut - more resources