Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2020 Sep 29;22(10):1100.
doi: 10.3390/e22101100.

An Elementary Introduction to Information Geometry

Affiliations
Review

An Elementary Introduction to Information Geometry

Frank Nielsen. Entropy (Basel). .

Abstract

In this survey, we describe the fundamental differential-geometric structures of information manifolds, state the fundamental theorem of information geometry, and illustrate some use cases of these information manifolds in information sciences. The exposition is self-contained by concisely introducing the necessary concepts of differential geometry. Proofs are omitted for brevity.

Keywords: Bayesian hypothesis testing; Fisher–Rao distance; Hessian manifolds; affine connection; conjugate connections; curvature and flatness; differential geometry; dual metric-compatible parallel transport; dually flat manifolds; exponential family; gauge freedom; information manifold; metric compatibility; metric tensor; mixed parameterization; mixture clustering; mixture family; parameter divergence; separable divergence; statistical divergence; statistical invariance; statistical manifold; α-embeddings.

PubMed Disclaimer

Conflict of interest statement

The author declare no conflict of interest.

Figures

Figure A1
Figure A1
Illustration of the chordal slope lemma.
Figure 1
Figure 1
The parameter inference θ^ of a model from data D can also be interpreted as a decision making problem: decide which parameter of a parametric family of models M={mθ}θΘ suits the “best” the data. Information geometry provides a differential-geometric structure on manifold M which useful for designing and studying statistical decision rules.
Figure 2
Figure 2
Primal basis (red) and reciprocal basis (blue) of an inner product ·,· space. The primal/reciprocal basis are mutually orthogonal: e1 is orthogonal to e2, and e1 is orthogonal to e2.
Figure 3
Figure 3
Illustration of the parallel transport of vectors on tangent planes along a smooth curve. For a smooth curve c, with c(0)=p and c(1)=q, a vector vpTp is parallel transported smoothly to a vector vqTq such that for any t[0,1], we have vc(t)Tc(t).
Figure 4
Figure 4
Parallel transport with respect to the metric connection: the curvature effect can be visualized as the angle defect along the parallel transport on smooth (infinitesimal) loops. For a sphere manifold, a vector parallel-transported along a loop does not coincide with itself (e.g., a sphere), while it always conside with itself for a (flat) manifold (e.g., a cylinder).
Figure 5
Figure 5
Differential-geometric concepts associated to an affine connection ∇ and a metric tensor g.
Figure 6
Figure 6
Dual Pythagorean theorems in a dually flat space.
Figure 7
Figure 7
Five concentric pairs of dual Itakura–Saito circles.
Figure 8
Figure 8
Common dually flat spaces associated to smooth and strictly convex generators.
Figure 9
Figure 9
Visualizing the Cramér–Rao lower bound: the red ellipses display the Fisher information matrix of normal distributions N(μ,σ2) at grid locations. The black ellipses are sample covariance matrices centered at the sample means calculated by repeating 200 runs of sampling 100 iid variates for the normal parameters of the grid.
Figure 10
Figure 10
A divergence satisfies the property of information monotonicity iff D(θA¯:θA¯)D(θ:θ). Here, parameter θ represents a discrete distribution.
Figure 11
Figure 11
Overview of the main types of information manifolds with their relationships in information geometry.
Figure 12
Figure 12
Statistical Bayesian hypothesis testing: the best Maximum A Posteriori (MAP) rule chooses to classify an observation from the class that yields the maximum likelihood.
Figure 13
Figure 13
Exact geometric characterization (not necessarily i closed-form) of the best exponent error rate α*.
Figure 14
Figure 14
Geometric characterization of the best exponent error rate in the multiple hypothesis testing case.
Figure 15
Figure 15
Example of a mixture family of order D=2 (3 components: Laplacian, Gaussian and Cauchy prefixed distributions).
Figure 16
Figure 16
Example of w-GMM clustering into k=2 clusters.
Figure 17
Figure 17
Principled classes of distances/divergences.

References

    1. Shannon C.E. A mathematical theory of communication. Bell Syst. Tech. J. 1948;27:623–656. doi: 10.1002/j.1538-7305.1948.tb00917.x. - DOI
    1. Amari S. Information Geometry and Its Applications. Springer; Tokyo, Japan: 2016. Applied Mathematical Sciences.
    1. Kakihara S., Ohara A., Tsuchiya T. Information Geometry and Interior-Point Algorithms in Semidefinite Programs and Symmetric Cone Programs. J. Optim. Theory Appl. 2013;157:749–780. doi: 10.1007/s10957-012-0180-9. - DOI
    1. Amari S., Nagaoka H. Methods of Information Geometry. American Mathematical Society; Providence, RL, USA: 2007.
    1. Peirce C.S. Chance, Love, and Logic: Philosophical Essays. U of Nebraska Press; Lincoln, NE, USA: 1998.

LinkOut - more resources