Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 May 1;94(5):662-76.
doi: 10.1016/j.ajhg.2014.03.016. Epub 2014 Apr 17.

Maximizing the power of principal-component analysis of correlated phenotypes in genome-wide association studies

Affiliations

Maximizing the power of principal-component analysis of correlated phenotypes in genome-wide association studies

Hugues Aschard et al. Am J Hum Genet. .

Abstract

Many human traits are highly correlated. This correlation can be leveraged to improve the power of genetic association tests to identify markers associated with one or more of the traits. Principal component analysis (PCA) is a useful tool that has been widely used for the multivariate analysis of correlated variables. PCA is usually applied as a dimension reduction method: the few top principal components (PCs) explaining most of total trait variance are tested for association with a predictor of interest, and the remaining components are not analyzed. In this study we review the theoretical basis of PCA and describe the behavior of PCA when testing for association between a SNP and correlated traits. We then use simulation to compare the power of various PCA-based strategies when analyzing up to 100 correlated traits. We show that contrary to widespread practice, testing only the top PCs often has low power, whereas combining signal across all PCs can have greater power. This power gain is primarily due to increased power to detect genetic variants with opposite effects on positively correlated traits and variants that are exclusively associated with a single trait. Relative to other methods, the combined-PC approach has close to optimal power in all scenarios considered while offering more flexibility and more robustness to potential confounders. Finally, we apply the proposed PCA strategy to the genome-wide association study of five correlated coagulation traits where we identify two candidate SNPs that were not found by the standard approach.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Power to Detect a SNP Associated with a Single Trait in a Bivariate Analysis Power to detect the SNP associated with Y1 based on the tests of PC1, PC2, the combined PCs, and Y1 for different sample size and genetic effects (A and B), and proportion of phenotypic variance explained by PC1 and PC2 (C). The power of each of the four tests is presented as a function of c the correlation between Y1 and Y2, the sample size N, and v1 and v2, the proportion of the variance of Y1 and Y2 explained by the SNP, respectively.
Figure 2
Figure 2
Power to Detect a SNP Associated with Two Traits in a Bivariate Analysis Power at 5 × 10−8 significance level to detect the SNP associated with the Y. using the independent tests of Y1, Y2, PC1, and PC2 and a combined PCs test when analyzing 5,000 individuals. The genetic variant has a fixed effect on the trait Y1. The power of each of the four tests is presented as a function of the effect on the second trait Y2 for three levels of correlation between Y1 and Y2: 0.1 (A), 0.5 (B), and 0.9 (C).
Figure 3
Figure 3
Power Comparison for the Multivariate Analysis of Five Traits in the Presence of Pleiotropic Effect Power at 5 × 10−8 significance level for the detection of a genetic variant when analyzing five phenotypes. Between one and five phenotypes are simulated as a function of the genetic variant, where its proportion of variance explained was randomly chosen between 0.1% and 0.5%. All genetic effects were positive and the associated phenotypes were randomly selected with equal probability. The bars represent the power of eight different tests. The univariate tests for each PC are shown in light gray and dark gray after correcting for the multiple testing, and the univariate test for the most significant PC (tPC) is in blue. The combined test of all five PCs is shown in black, and the most significant univariate test of all Y. in light green (dark green after correcting for the multiple testing). The power is shown for four different correlation models and 10,000 simulation replicates with 5,000 individuals.
Figure 4
Figure 4
Power of Alternative Methods for the Multivariate Analysis of Five Traits A comparison of the power at 5 × 10−8 significance level for detecting a genetic variant by five different multiple trait analysis: the combined PCs (CPC, in black); MANOVA (MAN, in red); multitrait mixed model (MTM, in blue); Multiphen (MUL, in orange); and TATES (TAT, in purple). These tests were applied to five traits where the number of traits with causal genetic effect was varied between one and five. All genetic effects were positive and associated phenotypes were selected randomly with probability proportional to their level of correlation with other phenotype. The proportion of variance explained by the causal variant was randomly chosen between 0.1% and 0.5%. The power is shown for four different correlation models and 10,000 simulation replicates with 5,000 individuals.
Figure 5
Figure 5
Power Comparison for the Multivariate Analysis of 100 Traits Power at 5 × 10−8 significance level for seven different tests when analyzing 100 phenotypes across 10,000 replicates. Plots were simulated under schemes SC1 (A), SC2 (B), and SC3 (C) (described in Appendix B and Figure S5). The top, middle, and bottom rows show the power for low, moderate, and high level of pleiotropy with sample sizes of 3,000, 2,000, and 1,000, respectively. The red curve corresponds to the test combining signals from the n PCs associated with the largest eigenvalues, the dark blue curve corresponds to the test combining signals on the 101-n PCs associated with the smallest eigenvalues, and the black curve corresponds to the combined test of latter two tests by the Fisher’s method, with n varying from 1 to 100. The dashed lines correspond to the test of all PCs combined (gray), MANOVA (red), multitrait mixed model (MTMM, blue), and TATES (purple).
Figure 6
Figure 6
Multivariate Analysis of Family Data Comparison of five multivariate tests for the analysis of 10 phenotypes in 200 nuclear families including two parents and one to five children for a total of 1,000 subjects. Pairwise phenotypic correlations follow a gradient from 0 to 0.8 (extended model 2 from Figure 3). (A) QQplots and lambda values under the null hypothesis of no association between the tested SNP and any of the ten phenotypes. (B) Power under the alternative, when the SNP is associated with three phenotypes chosen randomly, and proportion of phenotypic variance explained by the SNP varying in [0, 0.025]. We compared MANOVA (red), MultiPhen (orange), TATES (purple), CPC (combined PCs analysis, black), and MTMM (multitrait mixed model, blue). Under the null, all tests, except MTMM, show inflated type I error rate when the family structure is not accounted for. Applying a mixed model to the univariate phenotype or univariate PC analysis for TATES and CPC, respectively (dashed lines), solve this issue. Under the alternative, we applied a genomic control (GC) correction to all tests showing type I error inflation. Power was derived at a significance level of 5 × 10−3.

References

    1. Zhou X., Stephens M. Efficient multivariate linear mixed model algorithms for genome-wide association studies. Nat. Methods. 2014 Published online February 16, 2013. - PMC - PubMed
    1. Stephens M. A unified framework for association analysis with multiple related phenotypes. PLoS ONE. 2013;8:e65245. - PMC - PubMed
    1. Yang Q., Wang Y. Methods for analyzing multivariate phenotypes in genetic association studies. J. Probab. Stat. 2012;2012:13. - PMC - PubMed
    1. Solovieff N., Cotsapas C., Lee P.H., Purcell S.M., Smoller J.W. Pleiotropy in complex traits: challenges and strategies. Nat. Rev. Genet. 2013;14:483–495. - PMC - PubMed
    1. Korte A., Vilhjálmsson B.J., Segura V., Platt A., Long Q., Nordborg M. A mixed-model approach for genome-wide association studies of correlated traits in structured populations. Nat. Genet. 2012;44:1066–1071. - PMC - PubMed

Publication types

MeSH terms