Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 May;179(1):637-50.
doi: 10.1534/genetics.107.082370. Epub 2008 May 5.

Gene-centric genomewide association study via entropy

Affiliations

Gene-centric genomewide association study via entropy

Yuehua Cui et al. Genetics. 2008 May.

Abstract

Genes are the functional units in most organisms. Compared to genetic variants located outside genes, genic variants are more likely to affect disease risk. The development of the human HapMap project provides an unprecedented opportunity for genetic association studies at the genomewide level for elucidating disease etiology. Currently, most association studies at the single-nucleotide polymorphism (SNP) or the haplotype level rely on the linkage information between SNP markers and disease variants, with which association findings are difficult to replicate. Moreover, variants in genes might not be sufficiently covered by currently available methods. In this article, we present a gene-centric approach via entropy statistics for a genomewide association study to identify disease genes. The new entropy-based approach considers genic variants within one gene simultaneously and is developed on the basis of a joint genotype distribution among genetic variants for an association test. A grouping algorithm based on a penalized entropy measure is proposed to reduce the dimension of the test statistic. Type I error rates and power of the entropy test are evaluated through extensive simulation studies. The results indicate that the entropy test has stable power under different disease models with a reasonable sample size. Compared to single SNP-based analysis, the gene-centric approach has greater power, especially when there is more than one disease variant in a gene. As the genomewide genic SNPs become available, our entropy-based gene-centric approach would provide a robust and computationally efficient way for gene-based genomewide association study.

PubMed Disclaimer

Figures

F<sc>igure</sc> 1.—
Figure 1.—
Examples of joint genotype distributions of cases and controls within one gene under the null hypothesis of no association. (A) The bar plots of sorted joint genotype frequencies in both cases and controls. Categories with frequencies below the horizontal line will be grouped. (B) Joint genotype distribution in cases. (C) Joint genotype distribution in controls. (D) Plot of the penalized entropy measure (PEM) against the threshold. The horizontal line in A labels the categories to be retained when the maximal amount of PEM is achieved. (E and F) The grouped joint genotype distributions in cases and controls, respectively. Here, the numbers on the x-axis represent the categories of joint genotypes. Data are generated using the MS program with sample size 200.
F<sc>igure</sc> 2.—
Figure 2.—
Null distributions of the test statistic Tgene from the simulated 200 cases and 200 controls with 10 joint genotypes in a gene. formula image indicates a formula image-distribution with 9 d.f.
F<sc>igure</sc> 3.—
Figure 3.—
Power comparison of gene-based and SNP-based genomewide association studies as a function of genotype relative risk (GRR) under two single-locus disease models, the additive model (A) and the multiplicative model (B). The risk allele frequencies at both loci are 0.30, and the numbers of individuals in both cases and controls are 800, genotyped on 1000 genes with a population prevalence of 0.1.
F<sc>igure</sc> 4.—
Figure 4.—
Power comparison of gene-based and SNP-based genomewide association studies as a function of genotypic effect (GE) under two two-locus disease models, model 1 (A) and model 2 (B) defined in Table 3. The numbers of individuals in both cases and controls are 800, genotyped on 1000 genes.
F<sc>igure</sc> 5.—
Figure 5.—
Power comparison of gene-based and SNP-based genomewide association studies as a function of genotypic effect (GE) under two three-locus disease models, model 1 (A) and model 2 (B) defined in Equation 5. The numbers of individuals in both cases and controls are 800, genotyped on 1000 genes.

Similar articles

Cited by

References

    1. Anteby, E. Y., C. Greenfield, S. Natanson-Yaron, D. Goldman-Wohl, Y. Hamani et al., 2004. Vascular endothelial growth factor, epidermal growth factor and fibroblast growth factor-4 and -10 stimulate trophoblast plasminogen activator system and metalloproteinase-9. Mol. Hum. Reprod. 10 229–235. - PubMed
    1. Benjamini, Y., and Y. Hochberg, 1995. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B 57 289–300.
    1. Boehnke, M., 1994. Limits of resolution of genetic linkage studies: implications for the positional cloning of human disease genes. Am. J. Hum. Genet. 55 379–390. - PMC - PubMed
    1. Conley, Y. P., A. Thalamuthu, J. Jacobsdottir, D. E. Weeks, T. Mah et al., 2005. Candidate gene analysis suggests a role for fatty acid biosynthesis and regulation of the complement system in the etiology of age-related maculopathy. Hum. Mol. Genet. 14 1991–2002. - PubMed
    1. Cover, T. M., and J. A. Thomas, 1991. Elements of Information Theory, pp. 12–15. Wiley, New York.

Publication types