Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Nov;12(6):714-22.
doi: 10.1093/bib/bbq090. Epub 2011 Jan 17.

Principal component analysis based methods in bioinformatics studies

Affiliations

Principal component analysis based methods in bioinformatics studies

Shuangge Ma et al. Brief Bioinform. 2011 Nov.

Abstract

In analysis of bioinformatics data, a unique challenge arises from the high dimensionality of measurements. Without loss of generality, we use genomic study with gene expression measurements as a representative example but note that analysis techniques discussed in this article are also applicable to other types of bioinformatics studies. Principal component analysis (PCA) is a classic dimension reduction approach. It constructs linear combinations of gene expressions, called principal components (PCs). The PCs are orthogonal to each other, can effectively explain variation of gene expressions, and may have a much lower dimensionality. PCA is computationally simple and can be realized using many existing software packages. This article consists of the following parts. First, we review the standard PCA technique and their applications in bioinformatics data analysis. Second, we describe recent 'non-standard' applications of PCA, including accommodating interactions among genes, pathways and network modules and conducting PCA with estimating equations as opposed to gene expressions. Third, we introduce several recently proposed PCA-based techniques, including the supervised PCA, sparse PCA and functional PCA. The supervised PCA and sparse PCA have been shown to have better empirical performance than the standard PCA. The functional PCA can analyze time-course gene expression data. Last, we raise the awareness of several critical but unsolved problems related to PCA. The goal of this article is to make bioinformatics researchers aware of the PCA technique and more importantly its most recent development, so that this simple yet effective dimension reduction technique can be better employed in bioinformatics data analysis.

PubMed Disclaimer

References

    1. Wong S. The Practical Bioinformatician. World Scientific Publishing Company; 2004.
    1. Ma S, Huang J. Penalized feature selection and classification in bioinformatics. Briefings in Bioinformatics. 2008;9:392–403. - PMC - PubMed
    1. Jolliffe IT. Principal Component Analysis. New York: Springer; 1986.
    1. Johnson RA, Wichern DW. Applied Multivariate Statistical Analysis. Upper Saddle River, NJ: Prentice Hall; 2001.
    1. Knudsen S. Cancer Diagnostics with DNA Microarrays. Hoboken, NJ: John Wiley and Sons; 2006.

Publication types

MeSH terms