Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Apr;38(3):242-53.
doi: 10.1002/gepi.21790. Epub 2014 Jan 30.

A generalized genetic random field method for the genetic association analysis of sequencing data

Affiliations

A generalized genetic random field method for the genetic association analysis of sequencing data

Ming Li et al. Genet Epidemiol. 2014 Apr.

Abstract

With the advance of high-throughput sequencing technologies, it has become feasible to investigate the influence of the entire spectrum of sequencing variations on complex human diseases. Although association studies utilizing the new sequencing technologies hold great promise to unravel novel genetic variants, especially rare genetic variants that contribute to human diseases, the statistical analysis of high-dimensional sequencing data remains a challenge. Advanced analytical methods are in great need to facilitate high-dimensional sequencing data analyses. In this article, we propose a generalized genetic random field (GGRF) method for association analyses of sequencing data. Like other similarity-based methods (e.g., SIMreg and SKAT), the new method has the advantages of avoiding the need to specify thresholds for rare variants and allowing for testing multiple variants acting in different directions and magnitude of effects. The method is built on the generalized estimating equation framework and thus accommodates a variety of disease phenotypes (e.g., quantitative and binary phenotypes). Moreover, it has a nice asymptotic property, and can be applied to small-scale sequencing data without need for small-sample adjustment. Through simulations, we demonstrate that the proposed GGRF attains an improved or comparable power over a commonly used method, SKAT, under various disease scenarios, especially when rare variants play a significant role in disease etiology. We further illustrate GGRF with an application to a real dataset from the Dallas Heart Study. By using GGRF, we were able to detect the association of two candidate genes, ANGPTL3 and ANGPTL4, with serum triglyceride.

Keywords: generalized estimating equation; rare variants; small-scale sequencing studies.

PubMed Disclaimer

Figures

Figure B1
Figure B1
Type I error and Power of GGRF, SKAT, and Burden test with decreasing ratio of casual variants/noise variants. Left: Quantitative Phenotypes, Right: Binary Phenotypes; T1E: Type I Error; 1 Direction: one-direction of effect sizes; Bidirection: bidirection of effect sizes.
Figure 1
Figure 1
Distribution of the minor allele frequencies of 508 sequence variants on chromosome 22 in exome sequencing data from the 1,000 Genome Project.
Figure 2
Figure 2
Shape of four types of weight functions used in the simulations. Maximum weight at MAF of 0.07% was rescaled to be 1 for each weight function. The scaling does not change the relative contribution of variants.
Figure 3
Figure 3
Type I error and Power of GGRF and SKAT on using four SNP-specific weights under four disease models. Left: Quantitative Phenotypes, Right: Binary Phenotypes; T1E: Type I Error; S1–S4: power under various disease scenarios. S1: effect sizes of causal variants are all equal; S2: effect sizes of causal variants are proportional to BETA weights; S3: effect sizes of causal variants are proportional to WSS weights; S4: effect sizes of causal variants are proportional to LOG weights.
Figure 4
Figure 4
Type I error and Power of GGRF and SKAT with decreasing ratio of casual variants/noise variants. Left: Quantitative Phenotypes, Right: Binary Phenotypes; T1E: Type I Error; S1–S4: power under various disease scenarios. S1: effect sizes of causal variants are all equal; S2: effect sizes of causal variants are proportional to BETA weights; S3: effect sizes of causal variants are proportional to WSS weights; S4: effect sizes of causal variants are proportional to LOG weights.
Figure 5
Figure 5
Type I error and power of GGRF/SKAT with various similarity-metrics/kernel-metrics. Top left: type I error for quantitative phenotypes; Bottom left: power for quantitative phenotypes. Top right: type I error for binary phenotypes; Bottom right: power for binary phenotypes. ADJ: bootstrap adjustment for SKAT, only available with binary phenotypes, linear kernel, and BETA weight.
Figure 6
Figure 6
Distribution of minor allele frequencies in ANGPTL3, ANGPTL4, ANGPTL5, and ANGPTL6 genes in 2,658 subjects from the DHS sequencing data.

Similar articles

Cited by

References

    1. Adler RJ, Taylor JE. Random Fields and Geometry. Springer; New York: 2007.
    1. Almasy L, Dyer TD, Peralta JM, Kent JW, Jr, Charlesworth JC, Curran JE, Blangero J. Genetic Analysis Workshop 17 mini-exome simulation. BMC Proc. 2011;5(Suppl 9):S2. - PMC - PubMed
    1. Ansorge WJ. Next-generation DNA sequencing techniques. N Biotechnol. 2009;25(4):195–203. - PubMed
    1. Besag J. Spatial interaction and statistical analysis of lattice systems. J R Stat Soc B. 1974;48:259–302.
    1. Bodmer W, Bonilla C. Common and rare variants in multifactorial susceptibility to common diseases. Nat Genet. 2008;40(6):695–701. - PMC - PubMed

Publication types