Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2020 Feb 28:3:4.
doi: 10.3389/frai.2020.00004. eCollection 2020.

An Introductory Review of Deep Learning for Prediction Models With Big Data

Affiliations
Review

An Introductory Review of Deep Learning for Prediction Models With Big Data

Frank Emmert-Streib et al. Front Artif Intell. .

Abstract

Deep learning models stand for a new learning paradigm in artificial intelligence (AI) and machine learning. Recent breakthrough results in image analysis and speech recognition have generated a massive interest in this field because also applications in many other domains providing big data seem possible. On a downside, the mathematical and computational methodology underlying deep learning models is very challenging, especially for interdisciplinary scientists. For this reason, we present in this paper an introductory review of deep learning approaches including Deep Feedforward Neural Networks (D-FFNN), Convolutional Neural Networks (CNNs), Deep Belief Networks (DBNs), Autoencoders (AEs), and Long Short-Term Memory (LSTM) networks. These models form the major core architectures of deep learning models currently used and should belong in any data scientist's toolbox. Importantly, those core architectural building blocks can be composed flexibly-in an almost Lego-like manner-to build new application-specific network architectures. Hence, a basic understanding of these network architectures is important to be prepared for future developments in AI.

Keywords: artificial intelligence; data science; deep learning; machine learning; neural networks; prediction models.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Number of publications in dependence on the publication year for DL, deep learning; CNN, convolutional neural network; DBN, deep belief network; LSTM, long short-term memory; AEN, autoencoder; and MLP, multilayer perceptron. The legend shows the search terms used to query the Web of Science publication database. The two dashed lines are scaled by a factor of 5 (deep learning) and 3 (convolutional neural network).
Figure 2
Figure 2
(A) Representation of a mathematical artificial neuron model. The input to the neuron is summed up and filtered by activation function ϕ (for examples see Table 1). (B) Simplified Representation of an artificial neuron model. Only the key elements are depicted, i.e., the input, the output, and the weights.
Figure 3
Figure 3
Two examples for Feedforward Neural Networks. (A) A shallow FFNN. (B) A Deep Feedforward Neural Network (D-FFNN) with 3 hidden layers.
Figure 4
Figure 4
(A) An example for a Convolutional Neural Network. The red edges highlight the fact that hidden layers are connected in a “local” way, i.e., only very few neurons connect the succeeding layers. (B) An example for shared weights and local connectivity in CNN. The red edges highlight the fact that hidden layers are connected in a “local” way, i.e., only very few neurons connect the succeeding layers. The labels w1,w2,w3 indicate the assigned weight for each connection, three hidden nodes share the same set of weights w1,w2,w3 when connecting to three local patches.
Figure 5
Figure 5
An example for calculating the values in the activation map. Here, the stride is 1 and the zero-padding is 0. The kernel slides by 1 pixel at a time from left to right starting from the left top position, after reaching the boarder the kernel will start from the second row and repeat the process until the whole input is covered. The red area indicates the local patch to be convoluted with the kernel, and the result is stored in the green field in the activation map.
Figure 6
Figure 6
Inception block structure. Here multiple blocks are stacked on top of each other, forming the input layer for the next block.
Figure 7
Figure 7
The structure of a residual block. Inside a block there can be as many weight layers as desired.
Figure 8
Figure 8
Examples for Boltzmann Machines. (A) The neurons are arranged on a circle. (B) The neurons are separated according to their type. Both Boltzmann Machines are identical and differ only in their visualization. (C) Transition from a Boltzmann Machine (left) to a Restricted Boltzmann Machine (right).
Figure 9
Figure 9
(A) Contrastive Divergence k-step algorithm using Gibbs sampling. (B) Backpropagation algorithm. (C) iRprop+ algorithm.
Figure 10
Figure 10
Visualizing the stacking of RBMs in order to learn the parameters Θ of a model in an unsupervised way.
Figure 11
Figure 11
The two stages of DBN learning. (Left) The hidden layer (purple) of one RBM is the input of the next RBM. For this reason their dimensions are equal. (Right) The two edges in fine-tuning denote the two stages of the backpropagation algorithm: the input feedforwarding and the error backpropagation. The orange layer indicated the output.
Figure 12
Figure 12
Visualizing the idea of autoencoder learning. The learned new encoding of the input is represented in the code layer (shown in blue).
Figure 13
Figure 13
(Left) A folded structure of a LSTM network model. (Right) An unfolded structure of a LSTM network model. xi is the input data at time i and yi is the corresponding output (i is the time step starting from (t − 1)). In this network, only yt+2 activated by softmax function is the final network output.
Figure 14
Figure 14
Internal connectivity pattern of a standard LSTM unit (blue rectangle). The output from the previous time step, h(t−1), and x(t), are the input to the block at time t, then the output h(t) at time t will be an input to the same block in the next time step (t + 1).
Figure 15
Figure 15
Internal connectivity of a Peephole LSTM unit (blue rectangle). Here x(t) is the input to the cell at time t, and h(t) is its output. The red arrows are the new peephole connections added, compared to the standard LSTM in Figure 14.
Figure 16
Figure 16
An LSTM classifier model for text classification. N is the sequence length of the input text (the number of words). Input from V1 to VN is a sequence of word embedding vectors used as input to the model at different time steps. yN is the final prediction result.
Figure 17
Figure 17
Classification error of the EMNIST data in dependence on the number of training samples. The standard errors are shown in red and the horizontal dashed line corresponds to an error of 5% (reference). The results are averaged over 10 independent runs.

References

    1. Alipanahi B., Delong A., Weirauch M. T., Frey B. J. (2015). Predicting the sequence specificities of DNA-and RNA-binding proteins by deep learning. Nat. Biotechnol. 33, 831–838. 10.1038/nbt.3300 - DOI - PubMed
    1. An J., Cho S. (2015). Variational Autoencoder Based Anomaly Detection Using Reconstruction Probability. Special Lecture on IE 2.
    1. Arulkumaran K., Deisenroth M. P., Brundage M., Bharath A. A. (2017). Deep reinforcement learning: a brief survey. IEEE Signal Process. Mag. 34, 26–38. 10.1109/MSP.2017.2743240 - DOI
    1. Bergmeir C., Benítez J. M. (2012). Neural networks in R using the stuttgart neural network simulator: RSNNS. J. Stat. Softw. 46, 1–26. 10.18637/jss.v046.i07 - DOI - PubMed
    1. Biran O., Cotton C. (2017). Explanation and justification in machine learning: a survey, in IJCAI-17 Workshop on Explainable AI (XAI). Vol. 8, 1.

LinkOut - more resources