Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2019 Jan;51(1):12-18.
doi: 10.1038/s41588-018-0295-5. Epub 2018 Nov 26.

A primer on deep learning in genomics

Affiliations
Review

A primer on deep learning in genomics

James Zou et al. Nat Genet. 2019 Jan.

Abstract

Deep learning methods are a class of machine learning techniques capable of identifying highly complex patterns in large datasets. Here, we provide a perspective and primer on deep learning applications for genome analysis. We discuss successful applications in the fields of regulatory genomics, variant calling and pathogenicity scores. We include general guidance for how to effectively use deep learning methods as well as a practical guide to tools and resources. This primer is accompanied by an interactive online tutorial.

PubMed Disclaimer

Conflict of interest statement

competing interests

M.H. is an employee of Peltarion.

Figures

Fig. 1 |
Fig. 1 |. Deep learning workflow in genomics.
a, A dataset should be randomly split into training, validation and test sets. The positive and negative examples should be balanced for potential confounders (for example, sequence content and location) so that the predictor learns salient features rather than confounders. b, The appropriate architecture is selected and trained on the basis of domain knowledge. For example, CNNs capture translation invariance, and RNNs capture more flexible spatial interactions. c, True positive (TP), false positive (FP), false negative (FN) and true negative (TN) rates are evaluated. When there are more negative than positive examples, precision and recall are often considered. d, The learned model is interpreted by computing how changing each nucleotide in the input affects the prediction. The interactive tutorial illustrates the four steps of this workflow (see URLs).
Fig. 2 |
Fig. 2 |. Applications of deep learning in genomics.
The boxes highlight several application domains and references discussed in the text. Image adapted with permission from ref. , Springer Nature.

References

    1. Angermueller C, Pärnamaa T, Parts L. & Stegle O. Deep learning for computational biology. Mol. Syst. Biol. 12, 878 (2016). - PMC - PubMed
    1. Ching T. et al. Opportunities and obstacles for deep learning in biology and medicine. J. R. Soc. Interface 15, 20170387 (2018). - PMC - PubMed
    1. Telenti A, Lippert C, Chang PC & DePristo M. Deep learning of genomic variation and regulatory network data. Hum. Mol. Genet. 27, R63–R71 (2018). - PMC - PubMed
    1. Yue T. & Wang H. Deep learning for genomics: a concise overview. Preprint at https://arxiv.org/abs/1802.00810 (2018).
    1. Camacho DM, Collins KM, Powers RK, Costello JC & Collins JJ Next-generation machine learning for biological networks. Cell 173, 1581–1592 (2018). - PubMed

Publication types