Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017;59(3):361-370.
doi: 10.1080/00401706.2016.1247017. Epub 2017 Apr 27.

A Geometric Approach to Archetypal Analysis and Nonnegative Matrix Factorization

Affiliations

A Geometric Approach to Archetypal Analysis and Nonnegative Matrix Factorization

Anil Damle et al. Technometrics. 2017.

Abstract

Archetypal analysis and non-negative matrix factorization (NMF) are staples in a statisticians toolbox for dimension reduction and exploratory data analysis. We describe a geometric approach to both NMF and archetypal analysis by interpreting both problems as finding extreme points of the data cloud. We also develop and analyze an efficient approach to finding extreme points in high dimensions. For modern massive datasets that are too large to fit on a single machine and must be stored in a distributed setting, our approach makes only a small number of passes over the data. In fact, it is possible to obtain the NMF or perform archetypal analysis with just two passes over the data.

Keywords: convex hull; group lasso; random projections.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Percentage of experiments in which the algorithm correctly identified the first k rows of X as the rows of H in a separable NMF. Here H is a matrix with independent entries each of which is uniform on [0, 1].
Figure 2
Figure 2
Percentage of experiments in which the algorithm correctly identified the first k rows of X as the rows of H in a separable NMF. Here H is the first k rows of a p × p Hilbert matrix.
Figure 3
Figure 3
XW̃ H̃F/n on a log10 scale for various ε and m when using a majority vote scheme to select .
Figure 4
Figure 4
Number of votes received for each row, sorted and normalized by the largest number of votes received. Each row of the image represents a distinct instance of the experiment, and each block of 100 rows corresponds to a given noise level, ε.
Figure 5
Figure 5
XW̃ H̃F/n on a log10 scale for various ε and m when using the group LASSO to select the rows of .
Figure 6
Figure 6
The performance of PPI and PPI + lasso versus that of SPA in terms of approximation error and recovery of true extreme points.

References

    1. Arora S, Ge R, Kannan R, Moitra A. Computing a nonnegative matrix factorization – provably; Proceedings of the 44th symposium on Theory of Computing; 2012. pp. 145–162.
    1. Ball K. An elementary introduction to modern convex geometry. Flavors of geometry. 1997;31:1–58.
    1. Bittorf V, Recht B, Re C, Tropp J. Factoring nonnegative matrices with linear programs. Advances in Neural Information Processing Systems. 2012;25:1223–1231.
    1. Boardman J. Geometric mixture analysis of imaging spectrometry data. Geoscience and Remote Sensing Symposium (IGARSS) ’94. 1994;4:2369–2371. vol.4.
    1. Chang C-I, Plaza A. A fast iterative algorithm for implementation of pixel purity index. Geoscience and Remote Sensing Letters, IEEE. 2006;3:63–67.

LinkOut - more resources