Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Nov 1;27(21):3029-35.
doi: 10.1093/bioinformatics/btr522. Epub 2011 Sep 19.

Sparse non-negative generalized PCA with applications to metabolomics

Affiliations

Sparse non-negative generalized PCA with applications to metabolomics

Genevera I Allen et al. Bioinformatics. .

Abstract

Motivation: Nuclear magnetic resonance (NMR) spectroscopy has been used to study mixtures of metabolites in biological samples. This technology produces a spectrum for each sample depicting the chemical shifts at which an unknown number of latent metabolites resonate. The interpretation of this data with common multivariate exploratory methods such as principal components analysis (PCA) is limited due to high-dimensionality, non-negativity of the underlying spectra and dependencies at adjacent chemical shifts.

Results: We develop a novel modification of PCA that is appropriate for analysis of NMR data, entitled Sparse Non-Negative Generalized PCA. This method yields interpretable principal components and loading vectors that select important features and directly account for both the non-negativity of the underlying spectra and dependencies at adjacent chemical shifts. Through the reanalysis of experimental NMR data on five purified neural cell types, we demonstrate the utility of our methods for dimension reduction, pattern recognition, sample exploration and feature selection. Our methods lead to the identification of novel metabolites that reflect the differences between these cell types.

Availability: www.stat.rice.edu/~gallen/software.html.

Contact: gallen@rice.edu.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Scatter plots of normalized sample PCs for the neural cell types data. Results from PCA, GPCA, Sparse Non-Negative PCA (SPCA) and Sparse Non-Negative GPCA (SGPCA) are compared for the five neural cell types. Sparse methods (bottom rows) demonstrate clearer separation of samples from different cell types.
Fig. 2.
Fig. 2.
Amount of variance explained by the PCs for the five neural cell type data. Comparison of the percentage of variance explained by individual PCs (top panel) and cumulative percentage of variance explained (bottom) between PCA and GPCA (left), and sparse non-negative PCA and sparse non-negative GPCA (right). GPCA methods explain larger proportions of the sample variance.
Fig. 3.
Fig. 3.
Proportion of features selected on the five neural cell types data by sparse non-negative PCA and GPCA for individual PCs (top) and by the cumulative PCs (bottom). Sparse non-negative GPCA explains more of the sample variance with fewer features selected.
Fig. 4.
Fig. 4.
Sparse non-negative GPCA loadings and sample PC heatmaps for the first seven PCs, which explain over 90% of the sample variance. Scaled PC loadings are superimposed on the average scaled spectra of neural stem cells, neurons, microglia and ‘Glia’, which includes oligodendrocytes and astrocytes. Sparse non-negative GPCA loadings reveal important patterns across the samples and spikes in the loadings denote the location of peaks that vary greatly across the samples. For example, PC3 exhibits peaks that have higher intensities in neural stem cells, while the peaks selected by PC5 have higher concentrations in microglia.

References

    1. Allen GI, et al. A generalized least squares matrix decomposition. USA: Technical Report No. TR2011-03. Rice University; 2011.
    1. Bollard M, et al. NMR-based metabonomic approaches for evaluating physiological influences on biofluid composition. NMR Biomed. 2005;18:143–162. - PubMed
    1. Coen M, et al. NMR-based metabolic profiling and metabonomic approaches to problems in molecular toxicology. Chem. Res. Toxicol. 2008;21:9–27. - PubMed
    1. Crockford D, et al. Curve-fitting method for direct quantitation of compounds in complex biological mixtures using 1h NMR: application in metabonomic toxicology studies. Anal. Chem. 2005;77:4556–4562. - PubMed
    1. De Graaf RA. In Vivo NMR Spectroscopy: Principles and Techniques. West Sussex, England: John Wiley & Sons; 2007.

Publication types