This is a preprint.
Deconfounded Dimension Reduction via Partial Embeddings
- PMID: 36711940
- PMCID: PMC9882043
- DOI: 10.1101/2023.01.10.523448
Deconfounded Dimension Reduction via Partial Embeddings
Abstract
Dimension reduction tools preserving similarity and graph structure such as t-SNE and UMAP can capture complex biological patterns in high-dimensional data. However, these tools typically are not designed to separate effects of interest from unwanted effects due to confounders. We introduce the partial embedding (PARE) framework, which enables removal of confounders from any distance-based dimension reduction method. We then develop partial t-SNE and partial UMAP and apply these methods to genomic and neuroimaging data. Our results show that the PARE framework can remove batch effects in single-cell sequencing data as well as separate clinical and technical variability in neuroimaging measures. We demonstrate that the PARE framework extends dimension reduction methods to highlight biological patterns of interest while effectively removing confounding effects.
Keywords: Dimension reduction; confounding effects; embeddings; genomics; neuroimaging.
Figures




References
-
- Amid E. and Warmuth M. K. (2022). TriMap: Large-scale Dimensionality Reduction Using Triplets. arXiv:1910.00204 [cs, stat].
-
- Baron M., Veres A., Wolock S. L., Faust A. L., Gaujoux R., Vetere A., Ryu J. H., Wagner B. K., Shen-Orr S. S., Klein A. M., Melton D. A., and Yanai I. (2016). A Single-Cell Transcriptomic Map of the Human and Mouse Pancreas Reveals Inter- and Intra-cell Population Structure. Cell Systems, 3(4):346–360.e4. - PMC - PubMed
-
- Belkin M. and Niyogi P. (2003). Laplacian Eigenmaps for Dimensionality Reduction and Data Representation. Neural Computation, 15(6):1373–1396.
Publication types
LinkOut - more resources
Full Text Sources