Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011;71(1):50-8.
doi: 10.1159/000323567. Epub 2011 Mar 10.

A principal components-based clustering method to identify variants associated with complex traits

Affiliations

A principal components-based clustering method to identify variants associated with complex traits

Mary Helen Black et al. Hum Hered. 2011.

Abstract

Background: Multivariate methods ranging from joint SNP to principal components analysis (PCA) have been developed for testing multiple markers in a region for association with disease and disease-related traits. However, these methods suffer from low power and/or the inability to identify the subset of markers contributing to evidence for association under various scenarios.

Methods: We introduce orthoblique principal components-based clustering (OPCC) as an alternative approach to identify specific subsets of markers showing association with a quantitative outcome of interest. We demonstrate the utility of OPCC using simulation studies and an example from the literature on type 2 diabetes.

Results: Compared to traditional methods, OPCC has similar or improved power under various scenarios of linkage disequilibrium structure and genotype availability. Most importantly, our simulations show how OPCC accurately parses large numbers of markers to a subset containing the causal variant or its proxy.

Conclusion: OPCC is a powerful and efficient data reduction method for detecting associations between gene variants and disease-related traits. Unlike alternative methodologies, OPCC has the ability to isolate the effect of causal SNP(s) from among large sets of markers in a candidate region. Therefore, OPCC is an improvement over PCA for testing multiple SNP associations with phenotypes of interest.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
LD structure among the 144 SNPs in the simulated TCF7L2 from a single simulated data set. a LD represented by pair-wise r2. b LD as a function of D'. Color key denotes LD, as either r2 or D', ranging from 0–1.
Fig. 2
Fig. 2
Estimated power for global association. Estimated power for the global test of association for PC and OPCC are shown for scenario 2 of genotype availability (54 tags + CV) and the different CVs in the simulated TCF7L2 (see text for details). White bar denotes joint SNP analysis. Light and dark grey bars denote PCA for number of PCs, and OPCC for number of clusters, explaining 60% of SNP variation, respectively.
Fig. 3
Fig. 3
Estimated power for univariate test of association. Estimated power for the univariate test of association for PC and OPCC are shown for scenario 2 of genotype availability (54 tags + CV) and the different CVs in the simulated TCF7L2 (see text for details). White bar denotes single SNP analysis. Light and dark grey bars denote PCA for number of PCs, and OPCC for number of clusters, explaining 60% of SNP variation, respectively.

References

    1. Heidema AG, Boer JMA, Nagelkerke N, Mariman ECM, Van Der A DL, Feskens EJM. The challenge for genetic epidemiologists: how to analyze large numbers of SNPs in relation to complex diseases. BMC Genet. 2006;7:23. - PMC - PubMed
    1. Vermeulen SHHM, Den Heijer M, Sham P, Knight J. Application of multi-locus analytical methods to identify interacting loci in case-control studies. Ann Hum Genet. 2007;71:689–700. - PubMed
    1. Gauderman WJ, Murcray C, Gilliland F, Conti D. Testing association between disease and multiple SNPs in a candidate gene. Genet Epidemiol. 2007;31:383–395. - PubMed
    1. Wang K, Abbott D. A principal components regression approach to multilocus genetic association studies. Genet Epidemiol. 2008;32:108–118. - PubMed
    1. Jolliffe IT. Principal Component Analysis. New York: Springer; 2002.

Publication types

Substances