Principal component analysis: a review and recent developments
- PMID: 26953178
- PMCID: PMC4792409
- DOI: 10.1098/rsta.2015.0202
Principal component analysis: a review and recent developments
Abstract
Large datasets are increasingly common and are often difficult to interpret. Principal component analysis (PCA) is a technique for reducing the dimensionality of such datasets, increasing interpretability but at the same time minimizing information loss. It does so by creating new uncorrelated variables that successively maximize variance. Finding such new variables, the principal components, reduces to solving an eigenvalue/eigenvector problem, and the new variables are defined by the dataset at hand, not a priori, hence making PCA an adaptive data analysis technique. It is adaptive in another sense too, since variants of the technique have been developed that are tailored to various different data types and structures. This article will begin by introducing the basic ideas of PCA, discussing what it can and cannot do. It will then describe some variants of PCA and their application.
Keywords: dimension reduction; eigenvectors; multivariate analysis; palaeontology.
© 2016 The Author(s).
Figures




References
-
- Pearson K. 1901. On lines and planes of closest fit to systems of points in space. Phil. Mag. 2, 559–572. (10.1080/14786440109462720) - DOI
-
- Hotelling H. 1933. Analysis of a complex of statistical variables into principal components. J. Educ. Psychol. 24, 417–441, 498–520 (10.1037/h0071325) - DOI
-
- Jackson JE. 1991. A user’s guide to principal components. New York, NY: Wiley.
-
- Jolliffe IT. 2002. Principal component analysis, 2nd edn New York, NY: Springer-Verlag.
-
- Diamantaras KI, Kung SY. 1996. Principal component neural networks: theory and applications. New York, NY: Wiley.
Publication types
LinkOut - more resources
Full Text Sources
Other Literature Sources