Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Aug 12;89(2):277-88.
doi: 10.1016/j.ajhg.2011.07.007.

Studying gene and gene-environment effects of uncommon and common variants on continuous traits: a marker-set approach using gene-trait similarity regression

Affiliations

Studying gene and gene-environment effects of uncommon and common variants on continuous traits: a marker-set approach using gene-trait similarity regression

Jung-Ying Tzeng et al. Am J Hum Genet. .

Abstract

Genomic association analyses of complex traits demand statistical tools that are capable of detecting small effects of common and rare variants and modeling complex interaction effects and yet are computationally feasible. In this work, we introduce a similarity-based regression method for assessing the main genetic and interaction effects of a group of markers on quantitative traits. The method uses genetic similarity to aggregate information from multiple polymorphic sites and integrates adaptive weights that depend on allele frequencies to accomodate common and uncommon variants. Collapsing information at the similarity level instead of the genotype level avoids canceling signals that have the opposite etiological effects and is applicable to any class of genetic variants without the need for dichotomizing the allele types. To assess gene-trait associations, we regress trait similarities for pairs of unrelated individuals on their genetic similarities and assess association by using a score test whose limiting distribution is derived in this work. The proposed regression framework allows for covariates, has the capacity to model both main and interaction effects, can be applied to a mixture of different polymorphism types, and is computationally efficient. These features make it an ideal tool for evaluating associations between phenotype and marker sets defined by linkage disequilibrium (LD) blocks, genes, or pathways in whole-genome analysis.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Type I Error Rates of the Proposed Methods The type I error rates are shown on the scale of 102, 103, and 104 for nominal level α = 0.05, 0.005, and 0.0005, respectively. The regions are randomly selected from chromosome 21 to represent six different scenarios listed on the x axis: two levels of disease allele frequencies (q = 0.1 and 0.3) combined with three levels of LD pattern (high, medium, and low). A high-LD value reflects stronger correlation between the observed markers and the two unobserved risk loci. The panel titles indicate the value of (γG1, γG2, γGE1, γGE2), that is the effect sizes of the main genetic effects and gene-environment interactions at the two risk loci used in generating simulated data. Each of the type I error rates is calculated on the basis of 50,000 replications for (γG1, γG2, γGE1, γGE2) = (0, 0, 0, 0) and 20,000 replications for (0.2, 0.2, 0, 0). The type I error rates for HAP-G at α = 0.0005 are given below as some are beyond the plotting range: (0.00454, 0.00266, 0.0023, 0.00158, 0.00794, and 0.00072).
Figure 2
Figure 2
Boxplot of Power of G × E Test from the 1734 Regions on Chromosome 21 The × sign indicates the average power. The power at a region is calculated on the basis of 100 replications at a nominal level of 0.0005. The results are grouped into 12 categories on the basis of frequencies of the risk alleles and LD patterns. The risk allele frequencies from rare to common are categorized as (A) both allele frequencies < 0.05; (B) sums of allele frequencies < 0.3 but excluding (A); (C) sums of allele frequencies between 0.3 and 0.6; and (D) sums of allele frequencies > 0.6. The clustering of LD patterns is done according to the following thresholds: average R2 > 0.6 for high (LD-H), average R2 ∈ (0.25, 0.6) for medium (LD-M), and average R2 < 0.25 for low (LD-L).
Figure 3
Figure 3
Boxplot of Power of G Test from the 1734 Regions on Chromosome 21 The × sign indicates the average power. The power at a region is calculated on the basis of 100 replications at a nominal level 0.0005. The results are grouped into 12 categories on the basis of frequencies of the risk alleles and LD patterns. The risk allele frequencies from rare to common are categorized as (A) both allele frequencies < 0.05; (B) sums of allele frequencies < 0.3 but excluding (A); (C) sums of allele frequencies between 0.3 and 0.6; and (D) sums of allele frequencies > 0.6. The clustering of LD patterns is done according to the following thresholds: average R2 > 0.6 for high (LD-H), average R2 ∈ (0.25, 0.6) for medium (LD-M), and average R2 < 0.25 for low (LD-L).
Figure 4
Figure 4
Boxplot of Power of Joint Test from the 1734 Regions on Chromosome 21 The × sign indicates the average power. The power at a region is calculated on the basis of 100 replications at a nominal level 0.0005. The results are grouped into 12 categories on the basis of frequencies of the risk alleles and LD patterns. The risk allele frequencies from rare to common are categorized as (A) both allele frequencies < 0.05; (B) sums of allele frequencies < 0.3 but excluding (A); (C) sums of allele frequencies between 0.3 and 0.6; and (D) sums of allele frequencies > 0.6. The clustering of LD patterns is done according to the following thresholds: average R2 > 0.6 for high (LD-H), average R2 ∈ (0.25, 0.6) for medium (LD-M), and average R2 < 0.25 for low (LD-L).
Figure 5
Figure 5
Boxplot of Power of G × E Test and G Test with Different Weights—SIM1, SIM2, and SIM0—from the 1734 Regions on Chromosome 21 The × sign indicates the average power of the method shown on the x axis. The solid and dotted lines indicate the average power of SNP test and HAP test, respectively. The power at a region is calculated on the basis of 100 replications at a nominal level 0.0005. The results are grouped into nine categories on the basis of frequencies of the risk alleles and LD patterns. The risk allele frequencies from rare to common are categorized: (A and D) both allele frequencies < 0.05; (B and E) sums of allele frequencies < 0.3 but excluding (A) and (D); (C and F) sums of allele frequencies > 0.3. The clustering of LD patterns is done according to the following thresholds: average R2 > 0.6 for high (LD-H), average R2 ∈ (0.25, 0.6) for medium (LD-M), and average R2 < 0.25 for low (LD-L).
Figure 6
Figure 6
p Values with Negative Log 10 Transformation for the VISP Trial Analysis The x axis shows the gene IDs sorted by the alphabetic order of the gene names, and gene ID 39 is CBS. The red line indicates results for SIM1, + for SNP method, and × for HAP method. The results for the SNP methods are based on the adjusted minimum p values that adjust for the multiple SNPs in a gene. The adjusted minimum p value is obtained by 1 − (1 − raw p value)keff, where keff is the effective number of independent tests estimated with the method of Moskvina and Schmidt after accounting the LD among SNPs in a gene. A few genes are not plotted on the graph for the HAP methods because of convergence failure at these locations. This failure is mostly attributed to excessive number of SNPs in the gene.

Similar articles

Cited by

References

    1. De la Cruz O., Wen X., Ke B., Song M., Nicolae D.L. Gene, region and pathway level analyses in whole-genome studies. Genet. Epidemiol. 2010;34:222–231. - PMC - PubMed
    1. Fisher R.A. Oliver and Boyd; London: 1932. Statistical methods for research workers.
    1. Li M., Wang K., Grant S.F., Hakonarson H., Li C. ATOM: a powerful gene-based association test by combining optimally weighted markers. Bioinformatics. 2009;25:497–503. - PMC - PubMed
    1. Wang T., Elston R.C. Improved power by use of a weighted score test for linkage disequilibrium mapping. Am. J. Hum. Genet. 2007;80:353–360. - PMC - PubMed
    1. Gauderman W.J., Murcray C., Gilliland F., Conti D.V. Testing association between disease and multiple SNPs in a candidate gene. Genet. Epidemiol. 2007;31:383–395. - PubMed

Publication types

Substances