Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2021 Jan;26(1):173-180.
doi: 10.1016/j.drudis.2020.10.002. Epub 2020 Oct 12.

Deep learning in next-generation sequencing

Affiliations
Review

Deep learning in next-generation sequencing

Bertil Schmidt et al. Drug Discov Today. 2021 Jan.

Abstract

Next-generation sequencing (NGS) methods lie at the heart of large parts of biological and medical research. Their fundamental importance has created a continuously increasing demand for processing and analysis methods of the data sets produced, addressing questions such as variant calling, metagenomic classification and quantification, genomic feature detection, or downstream analysis in larger biological or medical contexts. In addition to classical algorithmic approaches, machine-learning (ML) techniques are often used for such tasks. In particular, deep learning (DL) methods that use multilayered artificial neural networks (ANNs) for supervised, semisupervised, and unsupervised learning have gained significant traction for such applications. Here, we highlight important network architectures, application areas, and DL frameworks in a NGS context.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Overview of ANN architectures: (a) An artificial neuron maps an input vector xi, 0≤in, to a scalar output y by applying a nonlinear activation function φ to a weighted sum s:=i=0nwixi=wtx. (b) A multilayer perceptron (MLP) comprising an input layer, a fully connected hidden layer, and an output layer. (c) A single layer of a convolutional neural network (CNN), where matrix multiplication is replaced by a convolution with a small filter kernel matrix, the entries of which are learned during training followed by a ReLu activation function and (max)pooling. (d) Recurrent neural networks (RNNs) feature feedback connections to earlier layers and can be trained to learn time-dependent relations. (e) Autoencoders (AEs) are designed to identify useful data encodings in an unsupervised setting. (f) Generative adversarial networks (GANs) train two networks simultaneously. The generator produces new data points, whereas the discriminator classifies data points as either genuine or fake.

References

    1. Mavrou A. Serine–arginine protein kinase 1 (SRPK1) inhibition as a potential novel targeted therapeutic strategy in prostate cancer. Oncogene. 2015;34:4311–4319. - PMC - PubMed
    1. Stephens Z.D. Big data: astronomical or genomical? PLoS Biol. 2015;13:e1002195. - PMC - PubMed
    1. Harper A.R., Topol E.J. Pharmacogenomics in clinical practice and drug development. Nat. Biotechnol. 2012;30:1117–1124. - PMC - PubMed
    1. Heerboth S. Use of epigenetic drugs in disease: an overview. Genet. Epigenet. 2014;6:9–19. - PMC - PubMed
    1. Tang X. On the origin and continuing evolution of SARS-CoV-2. National Sci. Rev. 2020;7:1012–1023. - PMC - PubMed

Publication types

LinkOut - more resources