Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 May;33(4):308-16.
doi: 10.1002/gepi.20382.

Generalized linear modeling with regularization for detecting common disease rare haplotype association

Affiliations

Generalized linear modeling with regularization for detecting common disease rare haplotype association

Wei Guo et al. Genet Epidemiol. 2009 May.

Abstract

Whole genome association studies (WGAS) have surged in popularity in recent years as technological advances have made large-scale genotyping more feasible and as new exciting results offer tremendous hope and optimism. The logic of WGAS rests upon the common disease/common variant (CD/CV) hypothesis. Detection of association under the common disease/rare variant (CD/RV) scenario is much harder, and the current practices of WGAS may be under-power without large enough sample sizes. In this article, we propose a generalized linear model with regularization (rGLM) approach for detecting disease-haplotype association using unphased single nucleotide polymorphisms data that is applicable to both CD/CV and CD/RV scenarios. We borrow a dimension-reduction method from the data mining and statistical learning literature, but use it for the purpose of weeding out haplotypes that are not associated with the disease so that the associated haplotypes, especially those that are rare, can stand out and be accounted for more precisely. By using high-dimensional data analysis techniques, which are frequently employed in microarray analyses, interacting effects among haplotypes in different blocks can be investigated without much concern about the sample size being overwhelmed by the number of haplotype combinations. Our simulation study demonstrates the gain in power for detecting associations with moderate sample sizes. For detecting association under CD/RV, regression type methods such as that implemented in hapassoc may fail to provide coefficient estimates for rare associated haplotypes, resulting in a loss of power compared to rGLM. Furthermore, our results indicate that rGLM can uncover the associated variants much more frequently than can hapassoc.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Power comparisons between the results from hapassoc and rGLM. The X-axis denotes the three one-haplotype-block settings, which correspond to 6, 9, and 12 haplotypes, respectively. All simulations were based on 500 replicates.
Fig. 2
Fig. 2
Power comparisons between the results from hapassoc and rGLM for the fourth setting with interacting effects between haplotypes from two blocks. The dashed curve shows the powers for hapassoc based on 7, 22, and 155 replicates, the numbers of replicates (out of 500) that converged, for n = 200, 400, and 1,000, respectively.

References

    1. Akaike H. Information theory and an extension of the maximum likelihood principle. In: Petrov BN, Csaki F, editors. Second International Symposium on Information Theory; Akademia Kaido: Budapest; 1973. pp. 267–281.
    1. Akey JM, Jin L, Xiong M. Haplotypes vs. single-marker linkage disequilibrium tests: what do we gain? Eur J Hum Genet. 2001;9:292–300. - PubMed
    1. Amos CI, Wu X, Broderick P, Gorlov IP, Gu J, Eisen T, Dong Q, Zhang Q, Gu X, Vijayakrishnan J, et al. Genome-wide association scan of tag SNPs identifies a susceptibility locus for lung cancer at 15q25.1. Nature Genet. 2008 - PMC - PubMed
    1. Becker T, Knapp M. A Powerful Strategy to Account for Multiple Testing in the Context of Haplotype Analysis. Am J Hum Genet. 2004;75:561–570. - PMC - PubMed
    1. Burkett K, Graham J, McNeney B. hapassoc: Software for likelihood inference of trait associations with SNP haplotypes and other attributes. Journal of Statistical Software. 2006;16:1–19.

Publication types

LinkOut - more resources