Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Editorial
. 2017 Sep;5(17):351.
doi: 10.21037/atm.2017.07.12.

Principal components analysis in clinical studies

Affiliations
Editorial

Principal components analysis in clinical studies

Zhongheng Zhang et al. Ann Transl Med. 2017 Sep.

Abstract

In multivariate analysis, independent variables are usually correlated to each other which can introduce multicollinearity in the regression models. One approach to solve this problem is to apply principal components analysis (PCA) over these variables. This method uses orthogonal transformation to represent sets of potentially correlated variables with principal components (PC) that are linearly uncorrelated. PCs are ordered so that the first PC has the largest possible variance and only some components are selected to represent the correlated variables. As a result, the dimension of the variable space is reduced. This tutorial illustrates how to perform PCA in R environment, the example is a simulated dataset in which two PCs are responsible for the majority of the variance in the data. Furthermore, the visualization of PCA is highlighted.

Keywords: Principal component analysis; R; multicollinearity; regression.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: The authors have no conflicts of interest to declare.

Figures

Figure 1
Figure 1
Schematic illustration of how the principal components analysis works.
Figure 2
Figure 2
Screeplot representing the variances of all principal components.
Figure 3
Figure 3
Graphical display of multivariate data with biplot.
Figure 4
Figure 4
Graphical display of multivariate data with biplot.
Figure 5
Figure 5
Component loadings that characterize the strength and sing of the association of each independent variable (x1–x5) with each principal component (PC1–PC5).

References

    1. Schisterman EF, Perkins NJ, Mumford SL, et al. Collinearity and Causal Diagrams: A Lesson on the Importance of Model Specification. Epidemiology 2017;28:47-53. 10.1097/EDE.0000000000000554 - DOI - PMC - PubMed
    1. Vasquez MM, Hu C, Roe DJ, et al. Least absolute shrinkage and selection operator type methods for the identification of serum biomarkers of overweight and obesity: simulation and application. BMC Med Res Methodol 2016;16:154. 10.1186/s12874-016-0254-8 - DOI - PMC - PubMed
    1. Burt C. Factor analysis and canonical correlations. Br J Psychol 1948;1:95-106.
    1. Rencher AC. editor. Principal Component Analysis. 2nd ed. New York: John Wiley & Sons, Inc, 2002.
    1. Witteveen E, Wieske L, van der Poll T, et al. Increased Early Systemic Inflammation in ICU-Acquired Weakness; A Prospective Observational Cohort Study. Crit Care Med 2017;45:972-9. 10.1097/CCM.0000000000002408 - DOI - PubMed

Publication types