Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Mar 1;8(1):352-376.
doi: 10.1214/13-AOAS690.

JOINT ANALYSIS OF SNP AND GENE EXPRESSION DATA IN GENETIC ASSOCIATION STUDIES OF COMPLEX DISEASES

Affiliations

JOINT ANALYSIS OF SNP AND GENE EXPRESSION DATA IN GENETIC ASSOCIATION STUDIES OF COMPLEX DISEASES

Yen-Tsung Huang et al. Ann Appl Stat. .

Abstract

Genetic association studies have been a popular approach for assessing the association between common Single Nucleotide Polymorphisms (SNPs) and complex diseases. However, other genomic data involved in the mechanism from SNPs to disease, e.g., gene expressions, are usually neglected in these association studies. In this paper, we propose to exploit gene expression information to more powerfully test the association between SNPs and diseases by jointly modeling the relations among SNPs, gene expressions and diseases. We propose a variance component test for the total effect of SNPs and a gene expression on disease risk. We cast the test within the causal mediation analysis framework with the gene expression as a potential mediator. For eQTL SNPs, the use of gene expression information can enhance power to test for the total effect of a SNP-set, which are the combined direct and indirect effects of the SNPs mediated through the gene expression, on disease risk. We show that the test statistic under the null hypothesis follows a mixture of χ2 distributions, which can be evaluated analytically or empirically using the resampling-based perturbation method. We construct tests for each of three disease models that is determined by SNPs only, SNPs and gene expression, or includes also their interactions. As the true disease model is unknown in practice, we further propose an omnibus test to accommodate different underlying disease models. We evaluate the finite sample performance of the proposed methods using simulation studies, and show that our proposed test performs well and the omnibus test can almost reach the optimal power where the disease model is known and correctly specified. We apply our method to re-analyze the overall effect of the SNP-set and expression of the ORMDL3 gene on the risk of asthma.

Keywords: Causal Inference; Data Integration; Mediation Analysis; Mixed Models; SNP Set Analysis; Score Test; Variance Component Test.

PubMed Disclaimer

Figures

Fig 1
Fig 1
Causal diagram of the mediation model. S is a set of correlated exposure, e.g., SNP set; G is a mediator, e.g., gene expression; Y is an outcome, e.g., disease (yes/no); and X are covariates, including the true and potential confounders.
Fig 2
Fig 2
Empirical power. SNPs are assumed to be eQTL SNPs (δ = 1). Each figure plots the powers of the proposed tests as a function of the main effect of the SNP (βs). The three figures correspond to the three different true models, the model with only SNP effects, the model with main effects without interaction, and the model with SNPs, gene expression and their interaction effects. The dashed line in (a) indicates 5% type I error rate.
Fig 3
Fig 3
Simulated power curves for evaluating how different choices of causal SNPs affect the powers of the proposed tests. The x-axis indicates the physical location (Mb) of the 99 HapMap SNPs at 17q21. The orange vertical bar indicates the relative locations of the causal SNP and the black triangles indicate the ten typed SNPs. Different lines indicate the power of different tests. The lower panel of each subfigure is the plot for linkage disequilibrium, measured as r2 ranging from 0 (white) to 1 (black).
Fig 4
Fig 4
Empirical power under model mis-specification. SNPs are assumed to be eQTL SNPs (δ = 1). Each figure plots the powers of the proposed tests as a function of the main effect of SNP (βs). The six figures correspond to the different true models: the model with only SNP effects ((a) and (b)), the model with main effects of SNP and gene expression ((c) and (d)), and the model with SNPs, gene expression and their interaction effects ((e) and (f)). (a) (c) (e) are simulated under logit[P (Yi = 1|Scausal,i, Gi)] = −1000.9 + (100 +βSScausal,i + βGGi + γGiScausal,i)0.9 and (b), (d), (f) are simulated under the probit model Φ−1[P (Yi = 1|Scausal,i, Gi)] = −0.2 + βSScausal,i + βGGi + γGiScausal,i. The dashed lines in (a) and (b) indicate 5% type I error rate.

References

    1. Cai T, Lin X, Carroll R. Identifying Genetic Marker Sets Associated with Phenotypes via an Efficient Adaptive Score Test. Biostatistics. 2012 In press. - PMC - PubMed
    1. Carlo C. Oncogene and cancer. New England Journal of Medicine. 2008;358:502–511. - PubMed
    1. Cheung V, Spielman R, Ewens K, Weber T, Morley M, Burdick J. Mapping determinants of human gene expression by regional and genome-wide association. Nature. 2005;437:1365–1369. - PMC - PubMed
    1. Cusanovich DA, Billstrand C, Zhou X, Chavarria C, De Leon S, Michelini K, et al. The combination of a genome-wide association study of lymphocyte count and analysis of gene expression data reveals novel asthma candidate genes. Human Molecular Genetics. 2012;21:2111–2123. - PMC - PubMed
    1. Davies R. The distribution of a linear combination of chi-square random variables. Applied Statistics. 1980;29:323–333.

LinkOut - more resources