Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Jan 25:13:7.
doi: 10.1186/1471-2350-13-7.

Data mining of high density genomic variant data for prediction of Alzheimer's disease risk

Affiliations

Data mining of high density genomic variant data for prediction of Alzheimer's disease risk

Natalia Briones et al. BMC Med Genet. .

Abstract

Background: The discovery of genetic associations is an important factor in the understanding of human illness to derive disease pathways. Identifying multiple interacting genetic mutations associated with disease remains challenging in studying the etiology of complex diseases. And although recently new single nucleotide polymorphisms (SNPs) at genes implicated in immune response, cholesterol/lipid metabolism, and cell membrane processes have been confirmed by genome-wide association studies (GWAS) to be associated with late-onset Alzheimer's disease (LOAD), a percentage of AD heritability continues to be unexplained. We try to find other genetic variants that may influence LOAD risk utilizing data mining methods.

Methods: Two different approaches were devised to select SNPs associated with LOAD in a publicly available GWAS data set consisting of three cohorts. In both approaches, single-locus analysis (logistic regression) was conducted to filter the data with a less conservative p-value than the Bonferroni threshold; this resulted in a subset of SNPs used next in multi-locus analysis (random forest (RF)). In the second approach, we took into account prior biological knowledge, and performed sample stratification and linkage disequilibrium (LD) in addition to logistic regression analysis to preselect loci to input into the RF classifier construction step.

Results: The first approach gave 199 SNPs mostly associated with genes in calcium signaling, cell adhesion, endocytosis, immune response, and synaptic function. These SNPs together with APOE and GAB2 SNPs formed a predictive subset for LOAD status with an average error of 9.8% using 10-fold cross validation (CV) in RF modeling. Nineteen variants in LD with ST5, TRPC1, ATG10, ANO3, NDUFA12, and NISCH respectively, genes linked directly or indirectly with neurobiology, were identified with the second approach. These variants were part of a model that included APOE and GAB2 SNPs to predict LOAD risk which produced a 10-fold CV average error of 17.5% in the classification modeling.

Conclusions: With the two proposed approaches, we identified a large subset of SNPs in genes mostly clustered around specific pathways/functions and a smaller set of SNPs, within or in proximity to five genes not previously reported, that may be relevant for the prediction/understanding of AD.

PubMed Disclaimer

Figures

Figure 1
Figure 1
RF performance assessment, different number of features and number of trees fixed at 100; approach I.
Figure 2
Figure 2
RF tuning, best number of attributes at different number of trees; approach I. F = number of features.

Similar articles

Cited by

References

    1. Park A. Alzheimer's Unlocked. (cover story) Time. 2010;176(17):53. - PubMed
    1. Hollingworth P, Harold D, Jones L, Owen MJ, Williams J. Alzheimer's disease genetics: current knowledge and future challenges. Int J Geriatr Psychiatry. 2010. - PubMed
    1. Mawuenyega KG, Sigurdson W, Ovod V, Munsell L, Kasten T, Morris JC, Yarasheski KE, Bateman RJ. Decreased clearance of CNS beta-amyloid in Alzheimer's disease. Science. 2010;330(6012):1774. doi: 10.1126/science.1197623. - DOI - PMC - PubMed
    1. Gatz M, Reynolds CA, Fratiglioni L, Johansson B, Mortimer JA, Berg S, Fiske A, Pedersen NL. Role of genes and environments for explaining Alzheimer disease. Arch Gen Psychiatry. 2006;63(2):168–174. doi: 10.1001/archpsyc.63.2.168. - DOI - PubMed
    1. Harold D, Abraham R, Hollingworth P, Sims R, Gerrish A, Hamshere ML, Pahwa JS, Moskvina V, Dowzell K, Williams A, Jones N, Thomas C, Stretton A, Morgan AR, Lovestone S, Powell J, Proitsi P, Lupton MK, Brayne C, Rubinsztein DC, Gill M, Lawlor B, Lynch A, Morgan K, Brown KS, Passmore PA, Craig D, McGuinness B, Todd S, Holmes C. et al.Genome-wide association study identifies variants at CLU and PICALM associated with Alzheimer's disease. Nat Genet. 2009;41(10):1088–1093. doi: 10.1038/ng.440. - DOI - PMC - PubMed