Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011;6(11):e28145.
doi: 10.1371/journal.pone.0028145. Epub 2011 Nov 29.

Detecting low frequent loss-of-function alleles in genome wide association studies with red hair color as example

Affiliations

Detecting low frequent loss-of-function alleles in genome wide association studies with red hair color as example

Fan Liu et al. PLoS One. 2011.

Abstract

Multiple loss-of-function (LOF) alleles at the same gene may influence a phenotype not only in the homozygote state when alleles are considered individually, but also in the compound heterozygote (CH) state. Such LOF alleles typically have low frequencies and moderate to large effects. Detecting such variants is of interest to the genetics community, and relevant statistical methods for detecting and quantifying their effects are sorely needed. We present a collapsed double heterozygosity (CDH) test to detect the presence of multiple LOF alleles at a gene. When causal SNPs are available, which may be the case in next generation genome sequencing studies, this CDH test has overwhelmingly higher power than single SNP analysis. When causal SNPs are not directly available such as in current GWA settings, we show the CDH test has higher power than standard single SNP analysis if tagging SNPs are in linkage disequilibrium with the underlying causal SNPs to at least a moderate degree (r²>0.1). The test is implemented for genome-wide analysis in the publically available software package GenABEL which is based on a sliding window approach. We provide the proof of principle by conducting a genome-wide CDH analysis of red hair color, a trait known to be influenced by multiple loss-of-function alleles, in a total of 7,732 Dutch individuals with hair color ascertained. The association signals at the MC1R gene locus from CDH were uniformly more significant than traditional GWA analyses (the most significant P for CDH = 3.11×10⁻¹⁴² vs. P for rs258322 = 1.33×10⁻⁶⁶). The CDH test will contribute towards finding rare LOF variants in GWAS and sequencing studies.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. A recessive and compound heterozygote model of the phenotype.
At left part of the figure (A and B) two rare recessive variants at the same gene locus are assumed to be directly genotyped. At the right part of the figure (C and D) two non-causal SNPs with higher minor allele frequencies and in LD with the causal SNPs are genotyped. The upper part of the figure depicts the logarithm scaled frequency of the cross genotypes of two variants (A and C). The lower part of the figure is an example of the genetic model under illustrative parameters. GRRAA = 8, GRRAaBb = 7, GRRBB = 6, rac2 = rbd2 = 0.1 (B and D).
Figure 2
Figure 2. The expected P values for the CDH test.
The −log10(P) values for two causal SNPs (on the left part of the figure, A and B) and for the single SNP chi-squared test (on the right part, C and D) are derived as a function of the genotype relative risk (GRRAA = GRRBB = GRRAaBb ranging from 1 to 10), the minor allele frequencies (q = q1 = q2 ranging from 0.01 to 0.05 when N is fixed at 10,000; A and C), and the total sample size N (ranging from 6,000 to 10,000 when q is fixed at 0.05; B and D). The base line prevalence of a binary phenotype is fixed at 5% in all analyses.
Figure 3
Figure 3. The power of CDH and single SNP analysis.
Proportion of P values≤5×10−8 from the CDH analysis (green dots) and the single SNP Cochran-Armitage test of two tagging SNPs c (red dots) and d (blue dots). Four SNPs were re-sampled 10,000 times from the Illumina 550 K chip. SNPs a and b were physically close (<200 kb) and had low MAFs (<5%). SNP c was in LD with a and SNP d was in LD with b. The genotypic relative risk was simulated according to the genotypes of a and b under the recessive and compound heterozygote model, where GRRAA = GRRBB = GRRAaBb. The base-line prevalence of a binary phenotype was fixed at 5%. A, when rac2 × rbd2≤0.1; B, when 0.1<rac2×rbd2≤0.5; C, when 0.5<rac2×rbd2≤0.9, and D, when rac2×rbd2>0.9.
Figure 4
Figure 4. The power of CDH and WSS.
The power of CDH and weighted sum statistic (WSS) was plotted against the portion of causal variants in the sampled region. A region spanning 200 kb was randomly sampled 10,000 times over the Illumina 550 K chip without replacement. For each sampling, a binary trait was simulated by considering a portion of the rare variants in the region to be causal under the recessive-set model described in . Other parameters were fixed (α = 0.05, n = 10,000, and GRR = 10 for carriers of any homozygote or CH genotype of the causal variants). Four sets of P values were derived when (1) all SNPs in the region were analyzed by CDH (blue), (2) all SNPs with MAF<0.05 were analyzed by WSS (red), (3) all non-causal SNPs were analyzed by CDH (green), and (4) all non-causal variants with MAF<0.05 were analyzed by WSS (purple). The power was defined as the portion of P values smaller than or equal to 5×10-8.
Figure 5
Figure 5. Association between SNPs at MC1R and the red hair color in the Rotterdam Study.
The -log10(P) values for association with red hair color were plotted for each genotyped SNP according to its chromosomal position (blue dots) and for the CDH test in each sliding window consisting of 100 SNPs (green dots represent the left-most SNP). The LD patterns in the Rotterdam Study population and in the HapMap CEU samples (release 27) and the known genes in the region were aligned bellow according to the physical position of the SNPs (genome-build version 36.3). The orange bar indicates the physical position of the MC1R gene. The yellow bar indicates the region between two SNPs based on which the most significant P value of the CDH test was obtained (the left-most SNP rs258322 and the right-most SNP rs8058895).
Figure 6
Figure 6. Frequency of diplotypes and the prevalence of red hair in the Rotterdam Study.
The causal SNP a is rs1805007 and b is rs1805008. The tagging SNP c is rs2011877 and d is rs2302898. Causal alleles A and B are indicated in red color. Common alleles are indicated in green background and minor alleles are indicated in orange background.

Similar articles

Cited by

References

    1. Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, et al. Finding the missing heritability of complex diseases. Nature. 2009;461:747–753. - PMC - PubMed
    1. Goldstein DB. Common genetic variation and human traits. N Engl J Med. 2009;360:1696–1698. - PubMed
    1. Singleton AB, Hardy J, Traynor BJ, Houlden H. Towards a complete resolution of the genetic architecture of disease. Trends Genet. 2010;26:438–442. - PMC - PubMed
    1. Sabeti PC, Schaffner SF, Fry B, Lohmueller J, Varilly P, et al. Positive natural selection in the human lineage. Science. 2006;312:1614–1620. - PubMed
    1. MacArthur DG, Tyler-Smith C. Loss-of-function variants in the genomes of healthy humans. Hum Mol Genet. 2010;19:R125–130. - PMC - PubMed

Publication types

Substances