Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2012 Sep;13(4):776-90.
doi: 10.1093/biostatistics/kxs015. Epub 2012 Jun 25.

Identifying genetic marker sets associated with phenotypes via an efficient adaptive score test

Affiliations
Comparative Study

Identifying genetic marker sets associated with phenotypes via an efficient adaptive score test

Tianxi Cai et al. Biostatistics. 2012 Sep.

Abstract

In recent years, genome-wide association studies (GWAS) and gene-expression profiling have generated a large number of valuable datasets for assessing how genetic variations are related to disease outcomes. With such datasets, it is often of interest to assess the overall effect of a set of genetic markers, assembled based on biological knowledge. Genetic marker-set analyses have been advocated as more reliable and powerful approaches compared with the traditional marginal approaches (Curtis and others, 2005. Pathways to the analysis of microarray data. TRENDS in Biotechnology 23, 429-435; Efroni and others, 2007. Identification of key processes underlying cancer phenotypes using biologic pathway analysis. PLoS One 2, 425). Procedures for testing the overall effect of a marker-set have been actively studied in recent years. For example, score tests derived under an Empirical Bayes (EB) framework (Liu and others, 2007. Semiparametric regression of multidimensional genetic pathway data: least-squares kernel machines and linear mixed models. Biometrics 63, 1079-1088; Liu and others, 2008. Estimation and testing for the effect of a genetic pathway on a disease outcome using logistic kernel machine regression via logistic mixed models. BMC bioinformatics 9, 292-2; Wu and others, 2010. Powerful SNP-set analysis for case-control genome-wide association studies. American Journal of Human Genetics 86, 929) have been proposed as powerful alternatives to the standard Rao score test (Rao, 1948. Large sample tests of statistical hypotheses concerning several parameters with applications to problems of estimation. Mathematical Proceedings of the Cambridge Philosophical Society, 44, 50-57). The advantages of these EB-based tests are most apparent when the markers are correlated, due to the reduction in the degrees of freedom. In this paper, we propose an adaptive score test which up- or down-weights the contributions from each member of the marker-set based on the Z-scores of their effects. Such an adaptive procedure gains power over the existing procedures when the signal is sparse and the correlation among the markers is weak. By combining evidence from both the EB-based score test and the adaptive test, we further construct an omnibus test that attains good power in most settings. The null distributions of the proposed test statistics can be approximated well either via simple perturbation procedures or via distributional approximations. Through extensive simulation studies, we demonstrate that the proposed procedures perform well in finite samples. We apply the tests to a breast cancer genetic study to assess the overall effect of the FGFR2 gene on breast cancer risk.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Theoretical power curve for the adaptive (solid curves), EB-score (dotted curves), and omnibus combining formula image and formula image (dot dashed curves) tests under local alternatives with various levels of correlations (corr.): 0.0 (thin gray curves); 0.2 (black curves); and 0.5 (thick gray curves).
Figure 2.
Figure 2.
Empirical power (in %) for various tests using the ASAH1 gene under local alternatives averaged over all the choices of ι’s. (a) q=12, bASAH1=3.5 (14% sparsity); (b) q=7, bASAH1=4.1 (50% sparsity); (c) q=2, bASAH1=7.1 (86% sparsity); and (d) q=1, bASAH1=10.6 (93% sparsity).
Figure 3.
Figure 3.
Empirical Power (in %) for various tests using the FGFR2 gene under local alternatives averaged over the set of ι with low ℘(⋅) (low) and the set of ι with moderate ℘(⋅) (moderate). For settings of q and b were considered: (a) q=25, bFGFR2=3.5 (19% sparsity); (b) q=16, bFGFR2=2.8 (48% sparsity); (c) q=4, bFGFR2=5.4 (87% sparsity); and (d) q=1, bFGFR2=10.6 (97% sparsity).

References

    1. Baum A. E., Akula N., Cabanero M., Cardona I., Corona W., Klemens B., Schulze T. G., Cichon S., Rietschel M., Nöthen M. M. A genome-wide association study implicates diacylglycerol kinase eta (DGKH) and several other genes in the etiology of bipolar disorder. Molecular Psychiatry. 2007;13:197–207. others. - PMC - PubMed
    1. Breslow N. E., Clayton D. G. Approximate inference in generalized linear mixed models. Journal of the American Statistical Association. 1993;88:9–25.
    1. Brown M. P. S., Grundy W. N., Lin D., Cristianini N., Sugnet C. W., Furey T. S., Ares M., Haussler D. Knowledge-based analysis of microarray gene expression data by using support vector machines. Proceedings of the National Academy of Sciences. 2000;97:262–267. - PMC - PubMed
    1. Carlin B. P., Louis T. A. Bayes and empirical Bayes methods for data analysis. Statistics and Computing. 1997;7:153–154.
    1. Commenges D. Robust genetic linkage analysis based on a score test of homogeneity: the weighted pairwise correlation statistic. Genetic Epidemiology. 1994;11:189–200. - PubMed

Publication types

Substances