Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Jun 10;88(6):706-717.
doi: 10.1016/j.ajhg.2011.04.023. Epub 2011 May 27.

DASH: a method for identical-by-descent haplotype mapping uncovers association with recent variation

Affiliations

DASH: a method for identical-by-descent haplotype mapping uncovers association with recent variation

Alexander Gusev et al. Am J Hum Genet. .

Abstract

Rare variants affecting phenotype pose a unique challenge for human genetics. Although genome-wide association studies have successfully detected many common causal variants, they are underpowered in identifying disease variants that are too rare or population-specific to be imputed from a general reference panel and thus are poorly represented on commercial SNP arrays. We set out to overcome these challenges and detect association between disease and rare alleles using SNP arrays by relying on long stretches of genomic sharing that are identical by descent. We have developed an algorithm, DASH, which builds upon pairwise identical-by-descent shared segments to infer clusters of individuals likely to be sharing a single haplotype. DASH constructs a graph with nodes representing individuals and links on the basis of such segments spanning a locus and uses an iterative minimum cut algorithm to identify densely connected components. We have applied DASH to simulated data and diverse GWAS data sets by constructing haplotype clusters and testing them for association. In simulations we show this approach to be significantly more powerful than single-marker testing in an isolated population that is from Kosrae, Federated States of Micronesia and has abundant IBD, and we provide orthogonal information for rare, recent variants in the outbred Wellcome Trust Case-Control Consortium (WTCCC) data. In both cohorts, we identified a number of haplotype associations, five such loci in the WTCCC data and ten in the isolated, that were conditionally significant beyond any individual nearby markers. We have replicated one of these loci in an independent European cohort and identified putative structural changes in low-pass whole-genome sequence of the cluster carriers.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Method Workflow A generalized representation of the DASH clustering algorithm across three windows (vertical lines) of a single chromosome. (A) Pairs of haploid individuals (left, colored circles) and their respective identical-by-descent segments, if any. True segments are represented by a thick gray bar spanning at least one window; false positive and negative regions are labeled and unfilled. (B) The corresponding haplotype graph for each respective window; the haploid individuals are represented as nodes (circles) (the color is consistent with that in A) and identical-by-descent sharing at the locus represented as edges (lines); false positive and false negative segments are dashed and dotted lines, respectively. Gray fill shows the most likely dense cluster detected by DASH. (C) The final haplotypes determined by the algorithm for each window; color is consistent with that in (A) and (B).
Figure 2
Figure 2
Method Comparison of Rare-Variant Association Power in One Isolated and One Outbred Cohort Power to detect a single rare variant was estimated by simulating causal sites at risk-allele frequency range of 0%–5% with fixed direct allelic significance of 2.5 × 10−20. All variants below 5% MAF were subsequently hidden from analysis, and power to detect association with remaining proxy markers was measured. Tested separately were single markers (yellow, SNP), high-quality imputed markers from HapMap reference and single markers (green, IMP), DASH haplotypes and single markers (blue DASH and SNP), and DASH haplotypes and high-quality imputed markers (DASH and IMP). For each method, power was measured as a percentage of variants for which a genome-wide significant proxy was identified (see Material and Methods). (A) Results in isolated cohort from Kosrae, Federated States of Micronesia (imputed from JPTCHB reference). (B) Results in European cohort from WTCCC data (imputed from CEU reference).
Figure 3
Figure 3
Method Comparison of Association Power in the Presence of Missing Genotypes and Phasing Error Power estimates (as in Figure 2) for causal variant at 2% risk-allele frequency are plotted with increasing levels of missing genotypes and phasing error. For both fault types, three methods are compared: single marker (yellow, SNP), imputation from HapMap JPTCHB (green, IMP), and DASH haplotypes (blue, DASH). Left: power as a function of percentage of variants excluded at random (filled line) and in increasing order of minor allele frequency (dashed line). Right: power as a function of probability that a heterozygous site will be switched (filled line) and probability a heterozygous site will switch the subsequent haplotype (dashed line); SNP and IMP methods unaffected by haplotype structure are shown for comparison.

References

    1. Browning B.L., Browning S.R. Efficient multilocus association testing for whole genome association studies using localized haplotype clustering. Genet. Epidemiol. 2007;31:365–375. - PubMed
    1. Kwee L.C., Liu D., Lin X., Ghosh D., Epstein M.P. A powerful and flexible multilocus association test for quantitative traits. Am. J. Hum. Genet. 2008;82:386–397. - PMC - PubMed
    1. Zaykin D.V., Westfall P.H., Young S.S., Karnoub M.A., Wagner M.J., Ehm M.G. Testing association of statistically inferred haplotypes with discrete and continuous traits in samples of unrelated individuals. Hum. Hered. 2002;53:79–91. - PubMed
    1. Purcell S., Daly M.J., Sham P.C. WHAP: Haplotype-based association analysis. Bioinformatics. 2007;23:255–256. - PubMed
    1. Allen A.S., Satten G.A. A novel haplotype-sharing approach for genome-wide case-control association studies implicates the calpastatin gene in Parkinson's disease. Genet. Epidemiol. 2009;33:657–667. - PMC - PubMed

Publication types

MeSH terms