Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 May 14:12:156.
doi: 10.1186/1471-2105-12-156.

A hidden two-locus disease association pattern in genome-wide association studies

Affiliations

A hidden two-locus disease association pattern in genome-wide association studies

Can Yang et al. BMC Bioinformatics. .

Abstract

Background: Recent association analyses in genome-wide association studies (GWAS) mainly focus on single-locus association tests (marginal tests) and two-locus interaction detections. These analysis methods have provided strong evidence of associations between genetics variances and complex diseases. However, there exists a type of association pattern, which often occurs within local regions in the genome and is unlikely to be detected by either marginal tests or interaction tests. This association pattern involves a group of correlated single-nucleotide polymorphisms (SNPs). The correlation among SNPs can lead to weak marginal effects and the interaction does not play a role in this association pattern. This phenomenon is due to the existence of unfaithfulness: the marginal effects of correlated SNPs do not express their significant joint effects faithfully due to the correlation cancelation.

Results: In this paper, we develop a computational method to detect this association pattern masked by unfaithfulness. We have applied our method to analyze seven data sets from the Wellcome Trust Case Control Consortium (WTCCC). The analysis for each data set takes about one week to finish the examination of all pairs of SNPs. Based on the empirical result of these real data, we show that this type of association masked by unfaithfulness widely exists in GWAS.

Conclusions: These newly identified associations enrich the discoveries of GWAS, which may provide new insights both in the analysis of tagSNPs and in the experiment design of GWAS. Since these associations may be easily missed by existing analysis tools, we can only connect some of them to publicly available findings from other association studies. As independent data set is limited at this moment, we also have difficulties to replicate these findings. More biological implications need further investigation.

Availability: The software is freely available at http://bioinformatics.ust.hk/hidden_pattern_finder.zip.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Illustration of unfaithfulness in association studies. There are three regression models in each scenario: Y ~ β1X1 + β2X2, formula image and formula image. In this figure, the marginal coefficient formula image and formula image are shown as projections (marked with bold red color) of Y on X1 and X2, respectively. (a) X1 is not correlated with X2. (b) X1 is positively correlated with X2. (c) X1 is negatively correlated with X2. (d) X1 is positively correlated with X2 but the sign of β1 is the opposite of the sign of β2. Scenario (c) and Scenario (d) illustrate unfaithfulness.
Figure 2
Figure 2
The performance comparison of four methods: Marginal association tests, Lasso, BEAM and the proposed exhaustive two-locus joint analysis. 100 data sets are generated under each parameter setting. 1000 samples (500 cases and 500 controls) are simulated in each data set. The power is calculated as the proportion of the 100 data sets in which the disease associated SNPs are detected.
Figure 3
Figure 3
Distributions of genotypes of rs668860 and rs10873672 in the bipolar disorder data set and the odds ratio computed for combined genotypes of these two SNPs. Left Panel: The distribution of genotypes of rs668860 and rs10873672 in case samples. Middle Panel: The distribution of genotypes of rs668860 and rs10873672 in control samples. Right panel: The estimated odds ratio for the combination of rs668860 and rs10873672. The odds ratio of genotype combination "AA/TT" is used as reference. The genotype combination "TT/CT" has significantly higher odds ratio than other genotype combinations.
Figure 4
Figure 4
Analysis result of the local region of the BD data set located by rs668860, rs10873672 and rs6691970. (a) The enriched signal after imputation: The intensity shows -log10P given by the joint regression. (b) The LD structure of this local area. (c) The -log10P value obtained using single-SNP analysis. (d) The locations of rs668860, rs10873672 and rs6691970.
Figure 5
Figure 5
Analysis result of the local region of the CAD data set located by rs7162070, rs1876853, rs8029602, rs16969475 and rs16969478. (a) The enriched signal after imputation: The -log10P value given by the joint regression. (b) The LD structure (r2) in the same region. (c) The -log10P of single SNP analysis. (d) The locations of the genotyped SNPs rs7162070, rs1876853, rs8029602, rs16969475 and rs16969478.

Similar articles

Cited by

  • Chapter 10: Mining genome-wide genetic markers.
    Zhang X, Huang S, Zhang Z, Wang W. Zhang X, et al. PLoS Comput Biol. 2012;8(12):e1002828. doi: 10.1371/journal.pcbi.1002828. Epub 2012 Dec 27. PLoS Comput Biol. 2012. PMID: 23300418 Free PMC article.

References

    1. Balding D. A tutorial on statistical methods for population association studies. Nature Reviews Genetics. 2006;7:781–791. doi: 10.1038/nrg1916. - DOI - PubMed
    1. Cordell H. Detecting gene-gene interactions that underlie human diseases. Nature Reviews Genetics. 2009;10:392–404. - PMC - PubMed
    1. Ritchie M, Hahn L, Roodi N, Bailey L, Dupont W, Parl F, Moore J. Multifactor-dimensionality reduction reveals high-order interactions among estrogenmetabolism genes in sporadic breast cancer. The American Journal of Human Genetics. 2001;69:138–147. doi: 10.1086/321276. - DOI - PMC - PubMed
    1. Schwarz D, Kónig I, Ziegler A. On Safari to Random Jungle: A fast implementation of Random Forests for high dimensional data. Bioinformatics. 2010. in press . - PMC - PubMed
    1. Zhang Y, Liu J. Bayesian inference of epistatic interactions in case-control studies. Nature Genetics. 2007;39:1167–1173. doi: 10.1038/ng2110. - DOI - PubMed

Publication types

MeSH terms