Exploring high-dimensional biological data with sparse contrastive principal component analysis
- PMID: 32176249
- DOI: 10.1093/bioinformatics/btaa176
Exploring high-dimensional biological data with sparse contrastive principal component analysis
Abstract
Motivation: Statistical analyses of high-throughput sequencing data have re-shaped the biological sciences. In spite of myriad advances, recovering interpretable biological signal from data corrupted by technical noise remains a prevalent open problem. Several classes of procedures, among them classical dimensionality reduction techniques and others incorporating subject-matter knowledge, have provided effective advances. However, no procedure currently satisfies the dual objectives of recovering stable and relevant features simultaneously.
Results: Inspired by recent proposals for making use of control data in the removal of unwanted variation, we propose a variant of principal component analysis (PCA), sparse contrastive PCA that extracts sparse, stable, interpretable and relevant biological signal. The new methodology is compared to competing dimensionality reduction approaches through a simulation study and via analyses of several publicly available protein expression, microarray gene expression and single-cell transcriptome sequencing datasets.
Availability and implementation: A free and open-source software implementation of the methodology, the scPCA R package, is made available via the Bioconductor Project. Code for all analyses presented in this article is also available via GitHub.
Contact: philippe_boileau@berkeley.edu.
Supplementary information: Supplementary data are available at Bioinformatics online.
© The Author(s) 2020. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Similar articles
-
Meta-analytic principal component analysis in integrative omics application.Bioinformatics. 2018 Apr 15;34(8):1321-1328. doi: 10.1093/bioinformatics/btx765. Bioinformatics. 2018. PMID: 29186328 Free PMC article.
-
Applying stability selection to consistently estimate sparse principal components in high-dimensional molecular data.Bioinformatics. 2015 Aug 15;31(16):2683-90. doi: 10.1093/bioinformatics/btv197. Epub 2015 Apr 10. Bioinformatics. 2015. PMID: 25861969 Free PMC article.
-
Edge-group sparse PCA for network-guided high dimensional data analysis.Bioinformatics. 2018 Oct 15;34(20):3479-3487. doi: 10.1093/bioinformatics/bty362. Bioinformatics. 2018. PMID: 29726900
-
projectR: an R/Bioconductor package for transfer learning via PCA, NMF, correlation and clustering.Bioinformatics. 2020 Jun 1;36(11):3592-3593. doi: 10.1093/bioinformatics/btaa183. Bioinformatics. 2020. PMID: 32167521 Free PMC article.
-
Functional Data Analysis: An Introduction and Recent Developments.Biom J. 2024 Oct;66(7):e202300363. doi: 10.1002/bimj.202300363. Biom J. 2024. PMID: 39330918 Review.
Cited by
-
Deep learning in single-cell and spatial transcriptomics data analysis: advances and challenges from a data science perspective.Brief Bioinform. 2025 Mar 4;26(2):bbaf136. doi: 10.1093/bib/bbaf136. Brief Bioinform. 2025. PMID: 40185158 Free PMC article.
-
A Pipeline for Natural Small Molecule Inhibitors of Endoplasmic Reticulum Stress.Front Pharmacol. 2022 Jul 22;13:956154. doi: 10.3389/fphar.2022.956154. eCollection 2022. Front Pharmacol. 2022. PMID: 35935873 Free PMC article.
-
An accessible infrastructure for artificial intelligence using a Docker-based JupyterLab in Galaxy.Gigascience. 2022 Dec 28;12:giad028. doi: 10.1093/gigascience/giad028. Epub 2023 Apr 26. Gigascience. 2022. PMID: 37099385 Free PMC article.
-
Seurat function argument values in scRNA-seq data analysis: potential pitfalls and refinements for biological interpretation.Front Bioinform. 2025 Feb 12;5:1519468. doi: 10.3389/fbinf.2025.1519468. eCollection 2025. Front Bioinform. 2025. PMID: 40013100 Free PMC article.
-
Single-cell omics: experimental workflow, data analyses and applications.Sci China Life Sci. 2025 Jan;68(1):5-102. doi: 10.1007/s11427-023-2561-0. Epub 2024 Jul 23. Sci China Life Sci. 2025. PMID: 39060615 Review.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources