Haisu: Hierarchically supervised nonlinear dimensionality reduction

Kevin Christopher VanHorn¹, Murat Can Çobanoğlu¹

Affiliations

PMID: 35862429
PMCID: PMC9345488
DOI: 10.1371/journal.pcbi.1010351

Haisu: Hierarchically supervised nonlinear dimensionality reduction

Kevin Christopher VanHorn et al. PLoS Comput Biol. 2022.

. 2022 Jul 21;18(7):e1010351.

doi: 10.1371/journal.pcbi.1010351. eCollection 2022 Jul.

Authors

Kevin Christopher VanHorn¹, Murat Can Çobanoğlu¹

Affiliation

¹ Lyda Hill Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America.

PMID: 35862429
PMCID: PMC9345488
DOI: 10.1371/journal.pcbi.1010351

Abstract

We propose a novel strategy for incorporating hierarchical supervised label information into nonlinear dimensionality reduction techniques. Specifically, we extend t-SNE, UMAP, and PHATE to include known or predicted class labels and demonstrate the efficacy of our approach on multiple single-cell RNA sequencing datasets. Our approach, "Haisu," is applicable across domains and methods of nonlinear dimensionality reduction. In general, the mathematical effect of Haisu can be summarized as a variable perturbation of the high dimensional space in which the original data is observed. We thereby preserve the core characteristics of the visualization method and only change the manifold to respect known or assumed class labels when provided. Our strategy is designed to aid in the discovery and understanding of underlying patterns in a dataset that is heavily influenced by parent-child relationships. We show that using our approach can also help in semi-supervised settings where labels are known for only some datapoints (for instance when only a fraction of the cells are labeled). In summary, Haisu extends existing popular visualization methods to enable a user to incorporate labels known a priori into a visualization, including their hierarchical relationships as defined by a user input graph.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

**Fig 1. Users can control the effect of an input hierarchy on the resulting embedding.**
We demonstrate the effect of Haisu as applied to t-SNE, UMAP, and PHATE with an input graph on a set of random points. In (A) we display the unmodified embedding of each NLDR method without HAISU or any hierarchical prior. In (B), we demonstrate the default mode of Haisu where self-distance = 0; at higher str values this results in a stronger hierarchical effect on the original embedding. In (C) self-distance = 1 for the disconnected class (blue), which is penalized for clustering near itself and spreads back toward classes each point is most similar to. In (D) self-distance = 0.5 which still applies the hierarchy but encourages more inter-class interaction than (B). Finally, in (E) self-distance = 1 for all classes, which penalizes intra-class clustering, resulting in a much looser representation of the hierarchy compared with (B).

**Fig 2. For this dataset, Haisu applies an input hierarchy based on cell function and lineage to guide the identification of sub-clusters.**
We display the effect of our method on popular nonlinear DR approaches and PCA at multiple ‘strength’ values (str), a tunable factor between 0 and (up to) 1 to control the strength of our hierarchical distancing function. Compared to raw NLDR (str = 0), Haisu reveals sub-clusters of T cells and better expresses the subtle relationship between datapoints in each method.

**Fig 3. Haisu applied to anatomical embryonic cardiac cell subpopulations via a proximity-based hierarchy.**
The raw embeddings of each method indicate two primary clusters with cell label assignments that are spaced out within each cluster. Haisu helps to add clarity to the embedding in a manner true to the known external hierarchy. Labels are assumed to be 100% accurate as they are location-based, but anatomic regions can have similar transcriptomic profiles. Thus, Haisu in this context, factors in gene expression and location when determining a lower dimensional embedding at an appropriate strength.

**Fig 4. We illustrate the effect of Haisu within the context of an epithelial differentiation hierarchy in the context of healthy and ulcerative colitis patients.**
In this dataset, strength factors up to 0.8 uphold appearance of the raw embedding. Thus, with sufficient confidence in cell type labels, Haisu preserves the structure of the NLDR method while also allowing a simpler examination of more subtle inter-cluster relationships.

**Fig 5. Haisu does not compromise the embedding of cells that do not have a label in the input graph.**
We depict 0% and 100% replacement of the TA-1 label with a ‘dummy’ label that is not present in the hierarchy across t-SNE, UMAP, and PHATE. Even at high strength values (str) of the hierarchical distancing factor, Haisu maintains relationships in the embedding circled in the figure. Notably, TA 1 cells remain close to Cycling TA cells across the embeddings at 100% removal despite their distance in the hierarchy graph. Thus, we do not comprise the integrity of each NLDR method, allowing for the observation of unknown classes in the context of a strongly influential, known hierarchy.

See this image and copyright information in PMC

References

1. Sivarajah U, Kamal MM, Irani Z, Weerakkody V. Critical analysis of Big Data challenges and analytical methods. Journal of Business Research. 2017;70: 263–286. doi: 10.1016/j.jbusres.2016.08.001 - DOI
1. Dimensionality Reduction—an overview | ScienceDirect Topics. [cited 2 May 2020]. Available: https://www.sciencedirect.com/topics/computer-science/dimensionality-red...
1. Sammon JW. A Nonlinear Mapping for Data Structure Analysis. IEEE Transactions on Computers. 1969;C–18: 401–409. doi: 10.1109/T-C.1969.222678 - DOI
1. Demartines P, Herault J. Curvilinear component analysis: a self-organizing neural network for nonlinear mapping of data sets. IEEE Transactions on Neural Networks. 1997;8: 148–154. doi: 10.1109/72.554199 - DOI - PubMed
1. Hinton GE, Roweis ST. Stochastic Neighbor Embedding: 8.

MeSH terms

Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Haisu: Hierarchically supervised nonlinear dimensionality reduction

Affiliation

Haisu: Hierarchically supervised nonlinear dimensionality reduction

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

References

MeSH terms

LinkOut - more resources

Full Text Sources