Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2022 Feb 13;24(2):269.
doi: 10.3390/e24020269.

A Short Review on Minimum Description Length: An Application to Dimension Reduction in PCA

Affiliations
Review

A Short Review on Minimum Description Length: An Application to Dimension Reduction in PCA

Vittoria Bruni et al. Entropy (Basel). .

Abstract

The minimun description length (MDL) is a powerful criterion for model selection that is gaining increasing interest from both theorists and practicioners. It allows for automatic selection of the best model for representing data without having a priori information about them. It simply uses both data and model complexity, selecting the model that provides the least coding length among a predefined set of models. In this paper, we briefly review the basic ideas underlying the MDL criterion and its applications in different fields, with particular reference to the dimension reduction problem. As an example, the role of MDL in the selection of the best principal components in the well known PCA is investigated.

Keywords: classification; dimension reduction; features extraction; minimum description length; principal component analysis.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
TEST 1. (Top) Red dotted line: sc(X;k) versus the number of components k for N=5, P=25; the minimum is correctly attained at k=5. Blue dashed line: sc(X;k) versus the number of components k for N=10, P=20; the minimum is correctly attained at k=10. (Bottom) The same plot where log(sc(X;k)) has been considered to improve its readability in correspondence to the minimum value.
Figure 2
Figure 2
TEST 2. Plot of sc(X;k) and its components versus the number of components k. (Top): non-normalized data. (Bottom): normalized data.
Figure 3
Figure 3
TEST 2. (Top) Plot of sc(X;k) and its components versus the number of components k. Signals have been uniformly sampled so that the dimension of X is n×m=66×90. (Bottom) Plot of log(sc(X;k)).
Figure 4
Figure 4
TEST 3. Plot of sc(X;k) and its components versus the number of components k; the minimum is attained at k=22.
Figure 5
Figure 5
TEST 3. (Left) Ground-truth Indian Pines image; (Middle) classification image using the best result of PCA-SVM in [78]; (Right) classification image using the PCA-SVM method and the number of components estimated using the stochastic complexity, as in Equation (5).

References

    1. Chandrashekar G., Sahin F. A survey on feature selection methods. Comput. Electr. Eng. 2014;40:16–28. doi: 10.1016/j.compeleceng.2013.11.024. - DOI
    1. Ferreira A.J., Figueiredo M.A.T. Efficient feature selection filters for high-dimensional data. Pattern Recognit. Lett. 2012;33:1794–1804. doi: 10.1016/j.patrec.2012.05.019. - DOI
    1. Jolliffe I., Cadima J. Principal component analysis: A review and recent developments. Philosphiocal Trans. A. 2016;374:20150202. doi: 10.1098/rsta.2015.0202. - DOI - PMC - PubMed
    1. Lee D., Seung H. Learning the parts of objects by non-negative matrix factorization. Nature. 1990;401:788–791. doi: 10.1038/44565. - DOI - PubMed
    1. Tenenbaum J.B., De Silva V., Langford J.C. A Global Geometric Framework for Nonlinear Dimensionality Reduction. Science. 2000;290:2319–2323. doi: 10.1126/science.290.5500.2319. - DOI - PubMed

LinkOut - more resources