Principal component analysis: a review and recent developments

Ian T Jolliffe¹, Jorge Cadima²

Affiliations

¹ College of Engineering, Mathematics and Physical Sciences, University of Exeter, Exeter, UK.
² Secção de Matemática (DCEB), Instituto Superior de Agronomia, Universidade de Lisboa, Tapada da Ajuda, Lisboa 1340-017, Portugal Centro de Estatística e Aplicações da Universidade de Lisboa (CEAUL), Lisboa, Portugal jcadima@isa.ulisboa.pt.

PMID: 26953178
PMCID: PMC4792409
DOI: 10.1098/rsta.2015.0202

Review

Principal component analysis: a review and recent developments

Ian T Jolliffe et al. Philos Trans A Math Phys Eng Sci. 2016.

. 2016 Apr 13;374(2065):20150202.

doi: 10.1098/rsta.2015.0202.

Authors

Ian T Jolliffe¹, Jorge Cadima²

Affiliations

¹ College of Engineering, Mathematics and Physical Sciences, University of Exeter, Exeter, UK.
² Secção de Matemática (DCEB), Instituto Superior de Agronomia, Universidade de Lisboa, Tapada da Ajuda, Lisboa 1340-017, Portugal Centro de Estatística e Aplicações da Universidade de Lisboa (CEAUL), Lisboa, Portugal jcadima@isa.ulisboa.pt.

PMID: 26953178
PMCID: PMC4792409
DOI: 10.1098/rsta.2015.0202

Abstract

Large datasets are increasingly common and are often difficult to interpret. Principal component analysis (PCA) is a technique for reducing the dimensionality of such datasets, increasing interpretability but at the same time minimizing information loss. It does so by creating new uncorrelated variables that successively maximize variance. Finding such new variables, the principal components, reduces to solving an eigenvalue/eigenvector problem, and the new variables are defined by the dataset at hand, not a priori, hence making PCA an adaptive data analysis technique. It is adaptive in another sense too, since variants of the technique have been developed that are tailored to various different data types and structures. This article will begin by introducing the basic ideas of PCA, discussing what it can and cannot do. It will then describe some variants of PCA and their application.

Keywords: dimension reduction; eigenvectors; multivariate analysis; palaeontology.

PubMed Disclaimer

Figures

**Figure 1.**
The two-dimensional principal subspace for the fossil teeth data. The coordinates in either or both PCs may switch signs when different software is used.

**Figure 2.**
Biplot for the fossil teeth data (correlation matrix PCA), obtained using R’s biplot command. (Online version in colour.)

**Figure 3.**
(a,b) The first two correlation-based EOFs for the SLP data account for 21% and 13% of total variation. (Adapted from [36].)

**Figure 4.**
(a,b) LASSO-based simplified EOFs for the SLP data. Grey areas are grid-points with exactly zero loadings. (Adapted from [36].)

See this image and copyright information in PMC

References

1. Pearson K. 1901. On lines and planes of closest fit to systems of points in space. Phil. Mag. 2, 559–572. ( 10.1080/14786440109462720) - DOI
1. Hotelling H. 1933. Analysis of a complex of statistical variables into principal components. J. Educ. Psychol. 24, 417–441, 498–520 ( 10.1037/h0071325) - DOI
1. Jackson JE. 1991. A user’s guide to principal components. New York, NY: Wiley.
1. Jolliffe IT. 2002. Principal component analysis, 2nd edn New York, NY: Springer-Verlag.
1. Diamantaras KI, Kung SY. 1996. Principal component neural networks: theory and applications. New York, NY: Wiley.

Publication types

Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Principal component analysis: a review and recent developments

Affiliations

Principal component analysis: a review and recent developments

Authors

Affiliations

Abstract

Figures

References

Publication types

LinkOut - more resources

Full Text Sources

Other Literature Sources