Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2016 Apr 13;374(2065):20150202.
doi: 10.1098/rsta.2015.0202.

Principal component analysis: a review and recent developments

Affiliations
Review

Principal component analysis: a review and recent developments

Ian T Jolliffe et al. Philos Trans A Math Phys Eng Sci. .

Abstract

Large datasets are increasingly common and are often difficult to interpret. Principal component analysis (PCA) is a technique for reducing the dimensionality of such datasets, increasing interpretability but at the same time minimizing information loss. It does so by creating new uncorrelated variables that successively maximize variance. Finding such new variables, the principal components, reduces to solving an eigenvalue/eigenvector problem, and the new variables are defined by the dataset at hand, not a priori, hence making PCA an adaptive data analysis technique. It is adaptive in another sense too, since variants of the technique have been developed that are tailored to various different data types and structures. This article will begin by introducing the basic ideas of PCA, discussing what it can and cannot do. It will then describe some variants of PCA and their application.

Keywords: dimension reduction; eigenvectors; multivariate analysis; palaeontology.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
The two-dimensional principal subspace for the fossil teeth data. The coordinates in either or both PCs may switch signs when different software is used.
Figure 2.
Figure 2.
Biplot for the fossil teeth data (correlation matrix PCA), obtained using R’s biplot command. (Online version in colour.)
Figure 3.
Figure 3.
(a,b) The first two correlation-based EOFs for the SLP data account for 21% and 13% of total variation. (Adapted from [36].)
Figure 4.
Figure 4.
(a,b) LASSO-based simplified EOFs for the SLP data. Grey areas are grid-points with exactly zero loadings. (Adapted from [36].)

References

    1. Pearson K. 1901. On lines and planes of closest fit to systems of points in space. Phil. Mag. 2, 559–572. (10.1080/14786440109462720) - DOI
    1. Hotelling H. 1933. Analysis of a complex of statistical variables into principal components. J. Educ. Psychol. 24, 417–441, 498–520 (10.1037/h0071325) - DOI
    1. Jackson JE. 1991. A user’s guide to principal components. New York, NY: Wiley.
    1. Jolliffe IT. 2002. Principal component analysis, 2nd edn New York, NY: Springer-Verlag.
    1. Diamantaras KI, Kung SY. 1996. Principal component neural networks: theory and applications. New York, NY: Wiley.

Publication types