Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2022 Nov 18:14:1027224.
doi: 10.3389/fnagi.2022.1027224. eCollection 2022.

Deep learning approaches for noncoding variant prioritization in neurodegenerative diseases

Affiliations
Review

Deep learning approaches for noncoding variant prioritization in neurodegenerative diseases

Alexander Y Lan et al. Front Aging Neurosci. .

Abstract

Determining how noncoding genetic variants contribute to neurodegenerative dementias is fundamental to understanding disease pathogenesis, improving patient prognostication, and developing new clinical treatments. Next generation sequencing technologies have produced vast amounts of genomic data on cell type-specific transcription factor binding, gene expression, and three-dimensional chromatin interactions, with the promise of providing key insights into the biological mechanisms underlying disease. However, this data is highly complex, making it challenging for researchers to interpret, assimilate, and dissect. To this end, deep learning has emerged as a powerful tool for genome analysis that can capture the intricate patterns and dependencies within these large datasets. In this review, we organize and discuss the many unique model architectures, development philosophies, and interpretation methods that have emerged in the last few years with a focus on using deep learning to predict the impact of genetic variants on disease pathogenesis. We highlight both broadly-applicable genomic deep learning methods that can be fine-tuned to disease-specific contexts as well as existing neurodegenerative disease research, with an emphasis on Alzheimer's-specific literature. We conclude with an overview of the future of the field at the intersection of neurodegeneration, genomics, and deep learning.

Keywords: gene regulation; genomics; machine learning; neurodegeneration; noncoding genetic variation.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

Figure. 1
Figure. 1
Model and layer architectures. (A) Diagram of the fully-connected architecture present in ANNs. Every node is connected with all nodes of the previous layer and all nodes of the following layer. (B) Diagram of a single convolutional filter within a single convolutional layer. Every element in the shaded input matrix is multiplied by the corresponding weight in the convolutional filter and combined to form one output value in the shaded output square. (C) Depiction of the recurrent neural network architecture, where the primary ANN block takes the current input along with memory information stored over short or long distances. (D) Flowchart of the transformer multi-head attention layer, which first takes a list of inputs and passes them through three ANN blocks. Together, the query and key matrix outputs form attention filters, which when multiplied with the outputs of the value matrix, generates a list of filtered output matrices. Each attention filter may highlight a different part of the input. The final output ANN is used to reduce the number of dimensions back to the original input size.
Figure 2
Figure 2
Sample genomics DL model with convolutional, attention, and intermediate layers. This model representation captures the most basic architecture used by most genomics DL models. The input DNA sequence is first one-hot encoded into the 4-by-N matrix shown on the left, then a convolutional layer extracts certain patterns by traversing the input sequence with multiple filters, whose weights are learned during training. Both standard and diluted convolutional layers are shown. Along with more convolutional or attention layers, model designers often use intermediate layers to simplify computation, consolidate data representations, or learn more patterns. Examples of intermediate layers include fully-connected, RNN, cropping, flatten, or pooling layers. Lastly, the model outputs either a predicted genomic track as shown or a single label representing the amount of enriched signal for the entire sequence.

Similar articles

Cited by

References

    1. Abadi M., Agarwal A., Barham P., Brevdo E., Chen Z., Citro C., et al. (2016). TensorFlow: large-scale machine learning on heterogeneous distributed systems. ArXiv 308–318. doi: 10.5555/3026877.3026899 - DOI
    1. Acheampong F. A., Nunoo-Mensah H., Chen W. (2021). Transformer models for text-based emotion detection: a review of BERT-based approaches. Artif. Intell. Rev. 54, 5789–5829. doi: 10.1007/s10462-021-09958-2 - DOI
    1. Agarwal V., Shendure J. (2020). Predicting mRNA abundance directly from genomic sequence using deep convolutional neural networks. Cell Rep. 31:107663. doi: 10.1016/j.celrep.2020.107663, PMID: - DOI - PubMed
    1. Alipanahi B., Delong A., Weirauch M. T., Frey B. J. (2015). Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat. Biotec39hnol. 33, 831–838. doi: 10.1038/nbt.3300, PMID: - DOI - PubMed
    1. Amariuta T., Luo Y., Gazal S., Davenport E. E., van de Geijn B., Ishigaki K., et al. (2019). IMPACT: genomic annotation of cell-state-specific regulatory elements inferred from the Epigenome of bound transcription factors. Am. J. Hum. Genet. 104, 879–895. doi: 10.1016/j.ajhg.2019.03.012, PMID: - DOI - PMC - PubMed

LinkOut - more resources