Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Apr 8:17:156.
doi: 10.1186/s12859-016-1006-9.

CollapsABEL: an R library for detecting compound heterozygote alleles in genome-wide association studies

Affiliations

CollapsABEL: an R library for detecting compound heterozygote alleles in genome-wide association studies

Kaiyin Zhong et al. BMC Bioinformatics. .

Abstract

Background: Compound Heterozygosity (CH) in classical genetics is the presence of two different recessive mutations at a particular gene locus. A relaxed form of CH alleles may account for an essential proportion of the missing heritability, i.e. heritability of phenotypes so far not accounted for by single genetic variants. Methods to detect CH-like effects in genome-wide association studies (GWAS) may facilitate explaining the missing heritability, but to our knowledge no viable software tools for this purpose are currently available.

Results: In this work we present the Generalized Compound Double Heterozygosity (GCDH) test and its implementation in the R package CollapsABEL. Time-consuming procedures are optimized for computational efficiency using Java or C++. Intermediate results are stored either in an SQL database or in a so-called big.matrix file to achieve reasonable memory footprint. Our large scale simulation studies show that GCDH is capable of discovering genetic associations due to CH-like interactions with much higher power than a conventional single-SNP approach under various settings, whether the causal genetic variations are available or not. CollapsABEL provides a user-friendly pipeline for genotype collapsing, statistical testing, power estimation, type I error control and graphics generation in the R language.

Conclusions: CollapsABEL provides a computationally efficient solution for screening general forms of CH alleles in densely imputed microarray or whole genome sequencing datasets. The GCDH test provides an improved power over single-SNP based methods in detecting the prevalence of CH in human complex phenotypes, offering an opportunity for tackling the missing heritability problem. Binary and source packages of CollapsABEL are available on CRAN ( https://cran.r-project.org/web/packages/CollapsABEL ) and the website of the GenABEL project ( http://www.genabel.org/packages ).

Keywords: Compound heterozygosity; Genome wide association study; Missing heritability; Next generation sequencing.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
CollapsABEL flowchart
Fig. 2
Fig. 2
Genome-shifting algorithm compared with sliding-window algorithm. The genome-shifting algorithm starts with a PLINK binary genotype file (the bed file), and shift the whole genome one SNP at a time, each time generating a new bed file containing collapsed genotypes. The total number of new bed files is equal to the user-specified window size k. a Shift by 1 SNP. b Shift by 2 SNPs. The sliding-window algorithm generates collapsed genotypes for all possible combinations of SNP pairs within a window, and at each iteration slides the window forward by one SNP. c 1st sliding window. d 2nd sliding window
Fig. 3
Fig. 3
Relationship between N, MAF, β and median p-value from the GCDH analysis and single SNP association analysis. SNP pairs with different MAFs are drawn from 1000-Genomes imputed Rotterdam Study microarray data. Sample sizes are fixed at 8000 or 11,000. Allele effect sizes β ranges from 0.5 to 1.5. Median p-values for SNPs from different MAF groups are distinguished using different colors. In total 2750 simulations are conducted
Fig. 4
Fig. 4
GCDH analysis using a simulated phenotype. Genotype data is from Rotterdam Study (11,496 subjects and 2,744,740 SNPs after setting MAF in the interval [0.01, 0.1], and only keeping SNPs that are genotyped in every subject). Phenotype is simulated with effect size 0.7 plus a random error term from the standard normal distribution according to the collapsed genotype of two randomly selected SNPs (rs138886950 and rs10440104 in this case), and run GCDH using this as the phenotype. Genome-wide significance threshold (the red horizontal line in the figure) is set at 5.0 × 10-8 for the single-SNP approach, for GCDH (the blue horizontal line) it is set empirically at 4.5 × 10-9 by permutation analysis (see the runTypeI function in CollapsABEL). Window size is set 55. a Genome-wide scan with causal SNPs available. b Genome-wide scan without genotypes of causal SNPs. c Regional GCDH with causal SNPs available. d Regional GCDH without genotypes of causal SNPs

Similar articles

Cited by

References

    1. Schaaf CP, Zschocke J, Potocki L: Human Genetics: from molecules to medicine. Philadelphia, USA: Lippincott Williams & Wilkins; 2011.
    1. Branicki W, Liu F, van Duijn K, Draus-Barini J, Pospiech E, Walsh S, Kupiec T, Wojas-Pelc A, Kayser M. Model-based prediction of human hair color using DNA variants. Hum Genet. 2011;129(4):443–454. doi: 10.1007/s00439-010-0939-8. - DOI - PMC - PubMed
    1. Liu F, Struchalin MV, Duijn K, Hofman A, Uitterlinden AG, Duijn C, Hofman A, Uitterlinden AG, Duijn C, Aulchenko YS, Kayser M. Detecting low frequent loss-of-function alleles in genome wide association studies with red hair color as example. PLoS One. 2011;6(11), e28145. - PMC - PubMed
    1. Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, McCarthy MI, Ramos EM, Cardon LR, Chakravarti A, et al. Finding the missing heritability of complex diseases. Nature. 2009;461(7265):747–753. doi: 10.1038/nature08494. - DOI - PMC - PubMed
    1. Goldstein DB. Common genetic variation and human traits. N Engl J Med. 2009;360(17):1696–1698. doi: 10.1056/NEJMp0806284. - DOI - PubMed

Publication types

LinkOut - more resources