Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Dec;38(8):699-708.
doi: 10.1002/gepi.21864. Epub 2014 Oct 20.

A weighted U-statistic for genetic association analyses of sequencing data

Affiliations

A weighted U-statistic for genetic association analyses of sequencing data

Changshuai Wei et al. Genet Epidemiol. 2014 Dec.

Abstract

With advancements in next-generation sequencing technology, a massive amount of sequencing data is generated, which offers a great opportunity to comprehensively investigate the role of rare variants in the genetic etiology of complex diseases. Nevertheless, the high-dimensional sequencing data poses a great challenge for statistical analysis. The association analyses based on traditional statistical methods suffer substantial power loss because of the low frequency of genetic variants and the extremely high dimensionality of the data. We developed a Weighted U Sequencing test, referred to as WU-SEQ, for the high-dimensional association analysis of sequencing data. Based on a nonparametric U-statistic, WU-SEQ makes no assumption of the underlying disease model and phenotype distribution, and can be applied to a variety of phenotypes. Through simulation studies and an empirical study, we showed that WU-SEQ outperformed a commonly used sequence kernel association test (SKAT) method when the underlying assumptions were violated (e.g., the phenotype followed a heavy-tailed distribution). Even when the assumptions were satisfied, WU-SEQ still attained comparable performance to SKAT. Finally, we applied WU-SEQ to sequencing data from the Dallas Heart Study (DHS), and detected an association between ANGPTL 4 and very low density lipoprotein cholesterol.

Keywords: next-generation sequencing; rare variants; weighted U-statistic.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Distribution of SNVs with MAF<0.03 for the 4 genes in the Dallas Heart Study
Figure 2
Figure 2
Distributions of the 3 phenotypes in the Dallas Heart Study

Similar articles

Cited by

References

    1. Abecasis G, Altshuler D, Auton A, Brooks L, Durbin R, Gibbs R, Hurles M, McVean G. A map of human genome variation from population-scale sequencing. Nature. 2010;467(7319):1061–1073. - PMC - PubMed
    1. Ahituv N, Kavaslar N, Schackwitz W, Ustaszewska A, Martin J, Hebert S, Doelle H, Ersoy B, Kryukov G, Schmidt S and others. Medical sequencing at the extremes of human body mass. American Journal of Human Genetics. 2007;80(4):779–791. - PMC - PubMed
    1. Barnett IJ, Lee S, Lin XH. Detecting Rare Variant Effects Using Extreme Phenotype Sampling in Sequencing Association Studies. Genetic Epidemiology. 2013;37(2):142–151. - PMC - PubMed
    1. Boyko AR, Williamson SH, Indap AR, Degenhardt JD, Hernandez RD, Lohmueller KE, Adams MD, Schmidt S, Sninsky JJ, Sunyaev SR and others. Assessing the evolutionary impact of amino acid mutations in the human genome. Plos Genetics. 2008;4(5) - PMC - PubMed
    1. Chen LS, Hsu L, Gamazon ER, Cox NJ, Nicolae DL. An Exponential Combination Procedure for Set-Based Association Tests in Sequencing Studies. American Journal of Human Genetics. 2012;91(6):977–986. - PMC - PubMed

Publication types

MeSH terms