Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2021 Jan 18;23(1):117.
doi: 10.3390/e23010117.

Probabilistic Models with Deep Neural Networks

Affiliations
Review

Probabilistic Models with Deep Neural Networks

Andrés R Masegosa et al. Entropy (Basel). .

Abstract

Recent advances in statistical inference have significantly expanded the toolbox of probabilistic modeling. Historically, probabilistic modeling has been constrained to very restricted model classes, where exact or approximate probabilistic inference is feasible. However, developments in variational inference, a general form of approximate probabilistic inference that originated in statistical physics, have enabled probabilistic modeling to overcome these limitations: (i) Approximate probabilistic inference is now possible over a broad class of probabilistic models containing a large number of parameters, and (ii) scalable inference methods based on stochastic gradient descent and distributed computing engines allow probabilistic modeling to be applied to massive data sets. One important practical consequence of these advances is the possibility to include deep neural networks within probabilistic models, thereby capturing complex non-linear stochastic relationships between the random variables. These advances, in conjunction with the release of novel probabilistic modeling toolboxes, have greatly expanded the scope of applications of probabilistic models, and allowed the models to take advantage of the recent strides made by the deep learning community. In this paper, we provide an overview of the main concepts, methods, and tools needed to use deep neural networks within a probabilistic modeling framework.

Keywords: Bayesian learning; deep probabilistic modeling; latent variable models; neural networks; variational inference.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Structure of the probabilistic model examined in this paper, defined for a sample of size N.
Figure 2
Figure 2
Two-dimensional latent representations resulting of applying a probabilistic PCA of: (Left) the iris dataset [48] and (Right) a subset of 1000 instances from the MNIST dataset [49] corresponding to the handwritten digits 1, 2 and 3.
Figure 3
Figure 3
Example of a simple Computational Graph. Squared nodes denote operations, and the rest are input nodes. This computational graph encodes the operation f=3·w+10, where w is a variable wrt. which we can differentiate.
Figure 4
Figure 4
Example of a simple computational graph encoding a neural network with two hidden layers and the squared loss function. Note that each operation node encapsulates a part of the CG encoding the associated operations, we do not expand the whole CG for the sake of simplicity.
Figure 5
Figure 5
Two-dimensional latent representations of the the MNIST dataset resulting of applying: (Left) a standard probabilistic PCA (reproduced from Figure 2 to ease comparison), and (Right) a non-linear probabilistic PCA with a ANN containing a hidden layer of size 100 with a relu activation function.
Figure 6
Figure 6
(Left) A stochastic computational graph encoding the function h=EZ[(Z5)2], where ZN(μ,1). (Right) Computational graph processing k samples from Z and producing h^, an estimate of EZ[(Z5)2].
Figure 7
Figure 7
The top part depicts a probabilistic graphical model using plate notation [8]. The lower part depicts an abstract representation of a stochastic computational graph encoding the model, where the relation between z and x is defined by a DNN with L+1 layers. See Section 4 for details.
Figure 8
Figure 8
SCG representing the ELBO function L(ν). r is distributed according to the variational distribution, rq(r|ν).
Figure 9
Figure 9
Reparameterized SCG representing the ELBO function L(ν).
Figure 10
Figure 10
Two-dimensional latent representation of the the MNIST dataset resulting of applying: (Left) a non-linear probabilistic PCA, and (Right) a VAE. The ANNs of the non-linear PCA and the ones defining the VAE’s decoder and encoder contain a single hidden layer of size 100.

Similar articles

Cited by

References

    1. Pearl J. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann Publishers; San Mateo, CA, USA: 1988.
    1. Lauritzen S.L. Propagation of probabilities, means, and variances in mixed graphical association models. J. Am. Stat. Assoc. 1992;87:1098–1108. doi: 10.1080/01621459.1992.10476265. - DOI
    1. Russell S.J., Norvig P. Artificial Intelligence: A Modern Approach. Pearson; Upper Saddle River, NJ, USA: 2016.
    1. Hastie T., Tibshirani R., Friedman J. The Elements of Statistical Learning. Springer; Berlin/Heidelberg, Germany: 2001.
    1. Bishop C.M. Pattern Recognition and Machine Learning. Springer; Berlin/Heidelberg, Germany: 2006.

LinkOut - more resources