Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Feb;41(2):108-121.
doi: 10.1002/gepi.22024. Epub 2016 Nov 25.

Multiple linear combination (MLC) regression tests for common variants adapted to linkage disequilibrium structure

Affiliations

Multiple linear combination (MLC) regression tests for common variants adapted to linkage disequilibrium structure

Yun Joo Yoo et al. Genet Epidemiol. 2017 Feb.

Abstract

By jointly analyzing multiple variants within a gene, instead of one at a time, gene-based multiple regression can improve power, robustness, and interpretation in genetic association analysis. We investigate multiple linear combination (MLC) test statistics for analysis of common variants under realistic trait models with linkage disequilibrium (LD) based on HapMap Asian haplotypes. MLC is a directional test that exploits LD structure in a gene to construct clusters of closely correlated variants recoded such that the majority of pairwise correlations are positive. It combines variant effects within the same cluster linearly, and aggregates cluster-specific effects in a quadratic sum of squares and cross-products, producing a test statistic with reduced degrees of freedom (df) equal to the number of clusters. By simulation studies of 1000 genes from across the genome, we demonstrate that MLC is a well-powered and robust choice among existing methods across a broad range of gene structures. Compared to minimum P-value, variance-component, and principal-component methods, the mean power of MLC is never much lower than that of other methods, and can be higher, particularly with multiple causal variants. Moreover, the variation in gene-specific MLC test size and power across 1000 genes is less than that of other methods, suggesting it is a complementary approach for discovery in genome-wide analysis. The cluster construction of the MLC test statistics helps reveal within-gene LD structure, allowing interpretation of clustered variants as haplotypic effects, while multiple regression helps to distinguish direct and indirect associations.

Keywords: common variants; linkage disequilibrium; multibin linear combination test; multivariant test; quantitative trait.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Clustering of SNPs in DCCT/EDIC CETP gene data by applying CLQ algorithm to linkage disequilibrium (r) pattern. Edges with |r| < 0.5 are removed. SNPs in the same cluster have the same color. The cluster construction threshold value for CLQ algorithm was set at c = 0.5
Figure 2
Figure 2
Simulation study results (Models 1–5): Average empirical power of MLC test statistics and other gene‐based statistics for 1,000 genes at nominal level α = 0.05 (N = 1,000 simulation replicates used to estimate power for each gene)
Figure 3
Figure 3
Simulation study results (Model 6): Distribution of gene‐specific empirical power of MLC‐B(c = 0.5 and 0.7) and other gene‐based statistics obtained for 1,000 genes at nominal level α = 0.05 stratified by the number of causal SNPs. The box plot shows five points: median, first, and third quartiles computed using Tukey's “hinges” and end points of whiskers. The whiskers extend to the most extreme values no more than 1.5 times the interquartile range. Outliers are shown in sand color. Note that the simulation error variance was adjusted separately for each gene to obtain 60% Wald test power in a sample size of n = 1,000 assuming the regression analysis includes causal SNPs. Upper panel (a) causal SNPs included in the regression analysis; lower panel (b) causal SNPs excluded from the regression analysis

References

    1. Al‐Kateb, H. , Boright, A. P. , Mirea, L. , Xie, X. , Sutradhar, R. , Mowjoodi, A. , …; Diabetes Control and Complications Trial/Epidemiology of Diabetes Interventions and Complications Research Group . (2008). Multiple superoxide dismutase 1/splicing 19 factor serine alanine 15 variants are associated with the development and progression of diabetic nephropathy. Diabetes, 57, 218–228. - PMC - PubMed
    1. Asimit, J. L. , Yoo, Y. J. , Waggott, D. , Sun, L. , & Bull, S. B. (2009). Region‐based analysis in GWA of FHS blood lipid traits. BMC Proceedings Supplement, 7, S127. - PMC - PubMed
    1. Asselbergs, F. W. , Guo, Y. , van Iperen, E. P. , Sivapalaratnam, S. , Tragante, V. , Lanktree, M. B. , … Drenos, F. (2012). Large‐scale gene‐centric meta‐analysis across 32 studies identifies multiple lipid loci. American Journal of Human Genetics, 91, 823–838. - PMC - PubMed
    1. Ayers, K. L. , & Cordell, H. J. (2013). Identification of grouped rare and common variants via penalized regression. Genetic Epidemiology, 37, 592–602. - PMC - PubMed
    1. Bacanu, S‐A. (2012). On optimal gene‐based analysis of genome scans. Genetic Epidemiology, 3, 333–339. - PubMed

Substances

LinkOut - more resources