Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013;13 Suppl 1(Suppl 1):S3.
doi: 10.1186/1472-6947-13-S1-S3. Epub 2013 Apr 5.

Finding type 2 diabetes causal single nucleotide polymorphism combinations and functional modules from genome-wide association data

Affiliations

Finding type 2 diabetes causal single nucleotide polymorphism combinations and functional modules from genome-wide association data

Chiyong Kang et al. BMC Med Inform Decis Mak. 2013.

Abstract

Background: Due to the low statistical power of individual markers from a genome-wide association study (GWAS), detecting causal single nucleotide polymorphisms (SNPs) for complex diseases is a challenge. SNP combinations are suggested to compensate for the low statistical power of individual markers, but SNP combinations from GWAS generate high computational complexity.

Methods: We aim to detect type 2 diabetes (T2D) causal SNP combinations from a GWAS dataset with optimal filtration and to discover the biological meaning of the detected SNP combinations. Optimal filtration can enhance the statistical power of SNP combinations by comparing the error rates of SNP combinations from various Bonferroni thresholds and p-value range-based thresholds combined with linkage disequilibrium (LD) pruning. T2D causal SNP combinations are selected using random forests with variable selection from an optimal SNP dataset. T2D causal SNP combinations and genome-wide SNPs are mapped into functional modules using expanded gene set enrichment analysis (GSEA) considering pathway, transcription factor (TF)-target, miRNA-target, gene ontology, and protein complex functional modules. The prediction error rates are measured for SNP sets from functional module-based filtration that selects SNPs within functional modules from genome-wide SNPs based expanded GSEA.

Results: A T2D causal SNP combination containing 101 SNPs from the Wellcome Trust Case Control Consortium (WTCCC) GWAS dataset are selected using optimal filtration criteria, with an error rate of 10.25%. Matching 101 SNPs with known T2D genes and functional modules reveals the relationships between T2D and SNP combinations. The prediction error rates of SNP sets from functional module-based filtration record no significance compared to the prediction error rates of randomly selected SNP sets and T2D causal SNP combinations from optimal filtration.

Conclusions: We propose a detection method for complex disease causal SNP combinations from an optimal SNP dataset by using random forests with variable selection. Mapping the biological meanings of detected SNP combinations can help uncover complex disease mechanisms.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Error rates of RF analysis with/without variable selection.
Figure 2
Figure 2
P-values and variable importance values of SNPs from the SNP combination.

Similar articles

Cited by

References

    1. Yang J, Benyamin B, McEvoy BP, Gordon S, Henders AK, Nyholt DR, Madden PA, Heath AC, Martin NG, Montgomery GW. et al.Common SNPs explain a large proportion of the heritability for human height. Nature genetics. 2010;42(7):565–569. doi: 10.1038/ng.608. - DOI - PMC - PubMed
    1. Park JH, Wacholder S, Gail MH, Peters U, Jacobs KB, Chanock SJ, Chatterjee N. Estimation of effect size distribution from genome-wide association studies and implications for future discoveries. Nature genetics. 2010;42(7):570–575. doi: 10.1038/ng.610. - DOI - PMC - PubMed
    1. WTCCC Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature. 2007;447(7145):661–678. doi: 10.1038/nature05911. - DOI - PMC - PubMed
    1. Wu TT, Chen YF, Hastie T, Sobel E, Lange K. Genome-wide association analysis by lasso penalized logistic regression. Bioinformatics. 2009;25(6):714–721. doi: 10.1093/bioinformatics/btp041. - DOI - PMC - PubMed
    1. Hoggart CJ, Whittaker JC, De Iorio M, Balding DJ. Simultaneous Analysis of All SNPs in Genome-Wide and Re-Sequencing Association Studies. PLoS Genet. 2008;4(7):e1000130. doi: 10.1371/journal.pgen.1000130. - DOI - PMC - PubMed

Publication types

MeSH terms