. 2011 Aug 12;89(2):277-88.

doi: 10.1016/j.ajhg.2011.07.007.

Studying gene and gene-environment effects of uncommon and common variants on continuous traits: a marker-set approach using gene-trait similarity regression

Jung-Ying Tzeng¹, Daowen Zhang, Monnat Pongpanich, Chris Smith, Mark I McCarthy, Michèle M Sale, Bradford B Worrall, Fang-Chi Hsu, Duncan C Thomas, Patrick F Sullivan

Affiliations

PMID: 21835306
PMCID: PMC3155192
DOI: 10.1016/j.ajhg.2011.07.007

Studying gene and gene-environment effects of uncommon and common variants on continuous traits: a marker-set approach using gene-trait similarity regression

Jung-Ying Tzeng et al. Am J Hum Genet. 2011.

. 2011 Aug 12;89(2):277-88.

doi: 10.1016/j.ajhg.2011.07.007.

Authors

Jung-Ying Tzeng¹, Daowen Zhang, Monnat Pongpanich, Chris Smith, Mark I McCarthy, Michèle M Sale, Bradford B Worrall, Fang-Chi Hsu, Duncan C Thomas, Patrick F Sullivan

Affiliation

¹ Department of Statistics, North Carolina State University, Raleigh, NC 27695, USA. jung-ying_tzeng@ncsu.edu

PMID: 21835306
PMCID: PMC3155192
DOI: 10.1016/j.ajhg.2011.07.007

Abstract

Genomic association analyses of complex traits demand statistical tools that are capable of detecting small effects of common and rare variants and modeling complex interaction effects and yet are computationally feasible. In this work, we introduce a similarity-based regression method for assessing the main genetic and interaction effects of a group of markers on quantitative traits. The method uses genetic similarity to aggregate information from multiple polymorphic sites and integrates adaptive weights that depend on allele frequencies to accomodate common and uncommon variants. Collapsing information at the similarity level instead of the genotype level avoids canceling signals that have the opposite etiological effects and is applicable to any class of genetic variants without the need for dichotomizing the allele types. To assess gene-trait associations, we regress trait similarities for pairs of unrelated individuals on their genetic similarities and assess association by using a score test whose limiting distribution is derived in this work. The proposed regression framework allows for covariates, has the capacity to model both main and interaction effects, can be applied to a mixture of different polymorphism types, and is computationally efficient. These features make it an ideal tool for evaluating associations between phenotype and marker sets defined by linkage disequilibrium (LD) blocks, genes, or pathways in whole-genome analysis.

PubMed Disclaimer

Figures

**Figure 1**
Type I Error Rates of the Proposed Methods The type I error rates are shown on the scale of 10², 10³, and 10⁴ for nominal level α = 0.05, 0.005, and 0.0005, respectively. The regions are randomly selected from chromosome 21 to represent six different scenarios listed on the x axis: two levels of disease allele frequencies (q = 0.1 and 0.3) combined with three levels of LD pattern (high, medium, and low). A high-LD value reflects stronger correlation between the observed markers and the two unobserved risk loci. The panel titles indicate the value of (γG₁, γG₂, γGE₁, γGE₂), that is the effect sizes of the main genetic effects and gene-environment interactions at the two risk loci used in generating simulated data. Each of the type I error rates is calculated on the basis of 50,000 replications for (γ_G₁, γ_G₂, γ_GE₁, γ_GE₂) = (0, 0, 0, 0) and 20,000 replications for (0.2, 0.2, 0, 0). The type I error rates for HAP-G at α = 0.0005 are given below as some are beyond the plotting range: (0.00454, 0.00266, 0.0023, 0.00158, 0.00794, and 0.00072).

**Figure 2**
Boxplot of Power of G × E Test from the 1734 Regions on Chromosome 21 The × sign indicates the average power. The power at a region is calculated on the basis of 100 replications at a nominal level of 0.0005. The results are grouped into 12 categories on the basis of frequencies of the risk alleles and LD patterns. The risk allele frequencies from rare to common are categorized as (A) both allele frequencies < 0.05; (B) sums of allele frequencies < 0.3 but excluding (A); (C) sums of allele frequencies between 0.3 and 0.6; and (D) sums of allele frequencies > 0.6. The clustering of LD patterns is done according to the following thresholds: average R² > 0.6 for high (LD-H), average R² ∈ (0.25, 0.6) for medium (LD-M), and average R² < 0.25 for low (LD-L).

**Figure 3**
Boxplot of Power of G Test from the 1734 Regions on Chromosome 21 The × sign indicates the average power. The power at a region is calculated on the basis of 100 replications at a nominal level 0.0005. The results are grouped into 12 categories on the basis of frequencies of the risk alleles and LD patterns. The risk allele frequencies from rare to common are categorized as (A) both allele frequencies < 0.05; (B) sums of allele frequencies < 0.3 but excluding (A); (C) sums of allele frequencies between 0.3 and 0.6; and (D) sums of allele frequencies > 0.6. The clustering of LD patterns is done according to the following thresholds: average R² > 0.6 for high (LD-H), average R² ∈ (0.25, 0.6) for medium (LD-M), and average R² < 0.25 for low (LD-L).

**Figure 4**
Boxplot of Power of Joint Test from the 1734 Regions on Chromosome 21 The × sign indicates the average power. The power at a region is calculated on the basis of 100 replications at a nominal level 0.0005. The results are grouped into 12 categories on the basis of frequencies of the risk alleles and LD patterns. The risk allele frequencies from rare to common are categorized as (A) both allele frequencies < 0.05; (B) sums of allele frequencies < 0.3 but excluding (A); (C) sums of allele frequencies between 0.3 and 0.6; and (D) sums of allele frequencies > 0.6. The clustering of LD patterns is done according to the following thresholds: average R² > 0.6 for high (LD-H), average R² ∈ (0.25, 0.6) for medium (LD-M), and average R² < 0.25 for low (LD-L).

**Figure 5**
Boxplot of Power of G × E Test and G Test with Different Weights—SIM1, SIM2, and SIM0—from the 1734 Regions on Chromosome 21 The × sign indicates the average power of the method shown on the x axis. The solid and dotted lines indicate the average power of SNP test and HAP test, respectively. The power at a region is calculated on the basis of 100 replications at a nominal level 0.0005. The results are grouped into nine categories on the basis of frequencies of the risk alleles and LD patterns. The risk allele frequencies from rare to common are categorized: (A and D) both allele frequencies < 0.05; (B and E) sums of allele frequencies < 0.3 but excluding (A) and (D); (C and F) sums of allele frequencies > 0.3. The clustering of LD patterns is done according to the following thresholds: average R² > 0.6 for high (LD-H), average R² ∈ (0.25, 0.6) for medium (LD-M), and average R² < 0.25 for low (LD-L).

**Figure 6**
p Values with Negative Log 10 Transformation for the VISP Trial Analysis The x axis shows the gene IDs sorted by the alphabetic order of the gene names, and gene ID 39 is *CBS*. The red line indicates results for SIM1, + for SNP method, and × for HAP method. The results for the SNP methods are based on the adjusted minimum p values that adjust for the multiple SNPs in a gene. The adjusted minimum p value is obtained by 1 − (1 − raw p value)^k_eff, where *k_eff* is the effective number of independent tests estimated with the method of Moskvina and Schmidt after accounting the LD among SNPs in a gene. A few genes are not plotted on the graph for the HAP methods because of convergence failure at these locations. This failure is mostly attributed to excessive number of SNPs in the gene.

See this image and copyright information in PMC

Cited by

Powerful Set-Based Gene-Environment Interaction Testing Framework for Complex Diseases.
Jiao S, Peters U, Berndt S, Bézieau S, Brenner H, Campbell PT, Chan AT, Chang-Claude J, Lemire M, Newcomb PA, Potter JD, Slattery ML, Woods MO, Hsu L. Jiao S, et al. Genet Epidemiol. 2015 Dec;39(8):609-18. doi: 10.1002/gepi.21908. Epub 2015 Jun 10. Genet Epidemiol. 2015. PMID: 26095235 Free PMC article.
A unified powerful set-based test for sequencing data analysis of GxE interactions.
Su YR, Di CZ, Hsu L; Genetics and Epidemiology of Colorectal Cancer Consortium. Su YR, et al. Biostatistics. 2017 Jan;18(1):119-131. doi: 10.1093/biostatistics/kxw034. Epub 2016 Jul 28. Biostatistics. 2017. PMID: 27474101 Free PMC article.
Comparison of statistical tests for association between rare variants and binary traits.
Bacanu SA, Nelson MR, Whittaker JC. Bacanu SA, et al. PLoS One. 2012;7(8):e42530. doi: 10.1371/journal.pone.0042530. Epub 2012 Aug 9. PLoS One. 2012. PMID: 22912707 Free PMC article.
Beyond the fourth wave of genome-wide obesity association studies.
Sandholt CH, Hansen T, Pedersen O. Sandholt CH, et al. Nutr Diabetes. 2012 Jul 30;2(7):e37. doi: 10.1038/nutd.2012.9. Nutr Diabetes. 2012. PMID: 23168490 Free PMC article.
Analysis of gene-gene interactions using gene-trait similarity regression.
Wang X, Epstein MP, Tzeng JY. Wang X, et al. Hum Hered. 2014;78(1):17-26. doi: 10.1159/000360161. Epub 2014 Jun 21. Hum Hered. 2014. PMID: 24969398 Free PMC article.

See all "Cited by" articles

References

1. De la Cruz O., Wen X., Ke B., Song M., Nicolae D.L. Gene, region and pathway level analyses in whole-genome studies. Genet. Epidemiol. 2010;34:222–231. - PMC - PubMed
1. Fisher R.A. Oliver and Boyd; London: 1932. Statistical methods for research workers.
1. Li M., Wang K., Grant S.F., Hakonarson H., Li C. ATOM: a powerful gene-based association test by combining optimally weighted markers. Bioinformatics. 2009;25:497–503. - PMC - PubMed
1. Wang T., Elston R.C. Improved power by use of a weighted score test for linkage disequilibrium mapping. Am. J. Hum. Genet. 2007;80:353–360. - PMC - PubMed
1. Gauderman W.J., Murcray C., Gilliland F., Conti D.V. Testing association between disease and multiple SNPs in a candidate gene. Genet. Epidemiol. 2007;31:383–395. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Studying gene and gene-environment effects of uncommon and common variants on continuous traits: a marker-set approach using gene-trait similarity regression

Affiliation

Studying gene and gene-environment effects of uncommon and common variants on continuous traits: a marker-set approach using gene-trait similarity regression

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Medical

Research Materials

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Medical

Research Materials