Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Nov 22;26(1):bbae704.
doi: 10.1093/bib/bbae704.

Genome-wide association neural networks identify genes linked to family history of Alzheimer's disease

Affiliations

Genome-wide association neural networks identify genes linked to family history of Alzheimer's disease

Upamanyu Ghose et al. Brief Bioinform. .

Abstract

Augmenting traditional genome-wide association studies (GWAS) with advanced machine learning algorithms can allow the detection of novel signals in available cohorts. We introduce "genome-wide association neural networks (GWANN)" a novel approach that uses neural networks (NNs) to perform a gene-level association study with family history of Alzheimer's disease (AD). In UK Biobank, we defined cases (n = 42 110) as those with AD or family history of AD and sampled an equal number of controls. The data was split into an 80:20 ratio of training and testing samples, and GWANN was trained on the former followed by identifying associated genes using its performance on the latter. Our method identified 18 genes to be associated with family history of AD. APOE, BIN1, SORL1, ADAM10, APH1B, and SPI1 have been identified by previous AD GWAS. Among the 12 new genes, PCDH9, NRG3, ROR1, LINGO2, SMYD3, and LRRC7 have been associated with neurofibrillary tangles or phosphorylated tau in previous studies. Furthermore, there is evidence for differential transcriptomic or proteomic expression between AD and healthy brains for 10 of the 12 new genes. A series of post hoc analyses resulted in a significantly enriched protein-protein interaction network (P-value < 1 × 10-16), and enrichment of relevant disease and biological pathways such as focal adhesion (P-value = 1 × 10-4), extracellular matrix organization (P-value = 1 × 10-4), Hippo signaling (P-value = 7 × 10-4), Alzheimer's disease (P-value = 3 × 10-4), and impaired cognition (P-value = 4 × 10-3). Applying NNs for GWAS illustrates their potential to complement existing algorithms and methods and enable the discovery of new associations without the need to expand existing cohorts.

Keywords: Alzheimer’s disease; GWAS; UK Biobank; artificial intelligence; machine learning; neural networks.

PubMed Disclaimer

Figures

Figure 1
Figure 1
NN architecture used in the GWANN method. The top-left branch generates a 1D encoding from the SNP input, while the bottom-left branch does so for the covariate input. The right trunk merges the encodings of both branches to output whether the input belongs to cases or controls.
Figure 2
Figure 2
Manhattan plot after running GWANN on family history of AD. Significant hits were identified at an empirically determined P-value threshold of P-value <1 × 10−25 (top dotted line). After calculating the LD between significant genes, the gene with the best NLL within an LD block was identified as the hit gene. The P-values lower than 6.95 × 10−159 have been cropped to a value of 6.95 × 10−160. The bottom dotted line marks the Bonferroni-corrected threshold for the number of gene windows that were tested, P-value = 7.06 × 10−7.
Figure 3
Figure 3
Overlap of GWANN hits with previous studies. (a) Heatmap showing the count of previous GWAS where the GWANN hits were identified to be associated with the phenotypes on the x-axis. (b) Heatmap showing the presence of significant evidence for the terms on the x-axis for the GWANN hits. (c) Heatmap showing the overlap between GWANN hits (GWANN), a GWAS run using PLINK 2.0 on the same data as GWANN (TradGWAS), and the largest European AD GWAS (EADB GWAS) [12]. (d) Similar heatmap to (c) but instead of using the GWANN and TradGWAS hits, the top 100 genes from both methods were considered for the overlap with the EADB GWAS hit genes. The sample size of each method is mentioned in the x-axis of the heatmaps, and the diagonals show the number of genes of each method considered while calculating the overlap.
Figure 4
Figure 4
Post hoc enrichment analysis after GWANN analysis. (a–d) Gene set enrichment analysis for (a) Reactome, (b) Wiki, (c) KEGG, and (d) GO using GWANN summary metrics. (e, f) Genes were ranked according to the metric 1 – NLLNN, where NLLNN was the negative log likelihood of the neural network for a given gene. (e) Disease and trait enrichment using the top 100 genes. (f) Enriched PPI network (P-value < 1 × 10−16) for the top 100 genes. The colors within the nodes highlight the trait categories enriched by the protein encoded by the gene.

Similar articles

Cited by

References

    1. Gauthier S, Rosa-Neto P, Morais JA. et al. World alzheimer report 2021: journey through the diagnosis of dementia. 2021.
    1. Querfurth HW, LaFerla FM. Alzheimer’s disease. N Engl J Med 2010;362:329–44. 10.1056/NEJMra0909142. - DOI - PubMed
    1. Buniello A, MacArthur JAL, Cerezo M. et al. . The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res 2019;47:D1005–12. 10.1093/nar/gky1120. - DOI - PMC - PubMed
    1. Manolio TA, Collins FS, Cox NJ. et al. . Finding the missing heritability of complex diseases. Nature 2009;461:747–53. 10.1038/nature08494. - DOI - PMC - PubMed
    1. Visscher PM, Brown MA, McCarthy MI. et al. . Five years of GWAS discovery. Am J Hum Genet 2012;90:7–24. 10.1016/j.ajhg.2011.11.029. - DOI - PMC - PubMed