Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Nov 29;5 Suppl 9(Suppl 9):S29.
doi: 10.1186/1753-6561-5-S9-S29.

Use of principal components to aggregate rare variants in case-control and family-based association studies in the presence of multiple covariates

Affiliations

Use of principal components to aggregate rare variants in case-control and family-based association studies in the presence of multiple covariates

Rémi Kazma et al. BMC Proc. .

Abstract

Rare variants may help to explain some of the missing heritability of complex diseases. Technological advances in next-generation sequencing give us the opportunity to test this hypothesis. We propose two new methods (one for case-control studies and one for family-based studies) that combine aggregated rare variants and common variants located within a region through principal components analysis and allow for covariate adjustment. We analyzed 200 replicates consisting of 209 case subjects and 488 control subjects and compared the results to weight-based and step-up aggregation methods. The principal components and collapsing method showed an association between the gene FLT1 and the quantitative trait Q1 (P<10-30) in a fraction of the computation time of the other methods. The proposed family-based test has inconclusive results. The two methods provide a fast way to analyze simultaneously rare and common variants at the gene level while adjusting for covariates. However, further evaluation of the statistical efficiency of this approach is warranted.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Top 30 genes associated with Q1 using the principal components and collapsing method with a case-control design. (a) Association with Q1 adjusting for age, sex, population, and smoking. (b) Association with Q1 adjusting for age, sex, population, smoking, and FLT1. Box plots represent the distribution of the 200 P-values of the 200 replicates for the 30 genes with the highest median P-values.
Figure 2
Figure 2
Top 30 genes associated with disease phenotype using the principal components and collapsing method with a case-control design. (a) Association with disease adjusting for age, sex, population, and smoking. (b) Association with disease adjusting for age, sex, population, smoking, and Q1. Box plots represent the distribution of the 200 P-values of the 200 replicates for the 30 genes with the highest median P-values.
Figure 3
Figure 3
Proportion of variability within genes explained by each of the first 10 principal components
Figure 4
Figure 4
Top 30 genes associated with Q1 using the weight-based and step-up methods with a case-control design. (a) Weight-based method adjusting for age, sex, population, and smoking. (b) Step-up method adjusting for age, sex, population, and smoking. Box plots represent the distribution of the 200 P-values of the 200 replicates for the 30 genes with the highest median P-values.
Figure 5
Figure 5
Top 30 genes associated with Q1 using the principal components and collapsing method with a family-based design. Box plots represent the distribution of the 200 P-values of the 200 replicates for the 30 genes with the highest median P-values.

References

    1. Bodmer W, Bonilla C. Common and rare variants in multifactorial susceptibility to common diseases. Nat Genet. 2008;40:695–701. doi: 10.1038/ng.f.136. - DOI - PMC - PubMed
    1. Nejentsev S, Walker N, Riches D, Egholm M, Todd JA. Rare variants of IFIH1, a gene implicated in antiviral responses, protect against type 1 diabetes. Science. 2009;324:387–389. doi: 10.1126/science.1167728. - DOI - PMC - PubMed
    1. Dering C, Pugh E, Ziegler A. Statistical analysis of rare sequence variants: an overview of collapsing methods. Genet Epidemiol. 2011. in press . - PMC - PubMed
    1. Han F, Pan W. A data-adaptive sum test for disease association with multiple common or rare variants. Hum Hered. 2010;70:42–54. doi: 10.1159/000288704. - DOI - PMC - PubMed
    1. Hoffmann TJ, Marini NJ, Witte JS. Comprehensive approach to analyzing rare genetic variants. PLoS One. 2010;5:e13584. doi: 10.1371/journal.pone.0013584. - DOI - PMC - PubMed