Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Sep 1;24(17):1896-902.
doi: 10.1093/bioinformatics/btn333. Epub 2008 Jul 10.

Multimarker analysis and imputation of multiple platform pooling-based genome-wide association studies

Affiliations

Multimarker analysis and imputation of multiple platform pooling-based genome-wide association studies

Nils Homer et al. Bioinformatics. .

Abstract

For many genome-wide association (GWA) studies individually genotyping one million or more SNPs provides a marginal increase in coverage at a substantial cost. Much of the information gained is redundant due to the correlation structure inherent in the human genome. Pooling-based GWA studies could benefit significantly by utilizing this redundancy to reduce noise, improve the accuracy of the observations and increase genomic coverage. We introduce a measure of correlation between individual genotyping and pooling, under the same framework that r(2) provides a measure of linkage disequilibrium (LD) between pairs of SNPs. We then report a new non-haplotype multimarker multi-loci method that leverages the correlation structure between SNPs in the human genome to increase the efficacy of pooling-based GWA studies. We first give a theoretical framework and derivation of our multimarker method. Next, we evaluate simulations using this multimarker approach in comparison to single marker analysis. Finally, we experimentally evaluate our method using different pools of HapMap individuals on the Illumina 450S Duo, Illumina 550K and Affymetrix 5.0 platforms for a combined total of 1 333 631 SNPs. Our results show that use of multimarker analysis reduces noise specific to pooling-based studies, allows for efficient integration of multiple microarray platforms and provides more accurate measures of significance than single marker analysis. Additionally, this approach can be extended to allow for imputing the association significance for SNPs not directly observed using neighboring SNPs in LD. This multimarker method can now be used to cost-effectively complete pooling-based GWA studies with multiple platforms across over one million SNPs and to impute neighboring SNPs weighted for the loss of information due to pooling.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Combined application of multimarker statistic on combined Affymetrix and Illumina data. We measure the difference from analyzing the pooling data considering each SNP individually (single marker or SA) versus considering each SNP utilizing information from neighboring SNPs in LD (multimarker or MM). The Illumina data graphs labeled curves represent only SNPs on the Illumina platform and the Affymetrix data labeled curves represent only SNPs on the Affymetrix platform. The dark lines above the bright lines in (A) and (C) indicate the greater accuracy in our multimarker method over the single marker method. In (B) and (D), we see the improvement in our multimarker method over the single marker methods since both lines are above the x-axis. (E) gives the directionality of our method where the majority of the SNPs were moved closer to their true association overall ranks. Finally, in all the graphs we see the dark blue line is always higher than the dark green line indicating that our method provides greater improvement on the Affymetrix platform versus the Illumina platform.

References

    1. Barratt BJ. Identification of the sources of error in allele frequency estimations from pooled DNA indicates an optimal experimental design. Ann. Hum. Genet. 2002;66:393–405. - PubMed
    1. Brown KM, et al. Common sequence variants on 20q11.22 confer melanoma susceptibility. Nat. Genet. 2008 - PMC - PubMed
    1. Craig DW, et al. Identification of disease causing loci using an array-based genotyping approach on pooled DNA. BMC Genomics. 2005;6:138. - PMC - PubMed
    1. Dai JY, et al. Imputation methods to improve inference in SNP association studies. Genet. Epidemiol. 2006;30:690–702. - PubMed
    1. Hanson RL, et al. Diabetes. 2007. A potential locus for end-stage renal disease in type 2 diabetes identified by a pooling-based genome-wide association study. in press.

Publication types

MeSH terms

Substances