. 2023 Jan 1;39(1):btac736.

doi: 10.1093/bioinformatics/btac736.

Clustering single-cell multi-omics data with MoClust

Musu Yuan¹, Liang Chen², Minghua Deng^{1

3

4}

Affiliations

¹ Center for Quantitative Biology, Peking University, Beijing 100871, China.
² Huawei Technologies Co., Ltd., Beijing 100080, China.
³ School of Mathematical Sciences, Peking University, Beijing 100871, China.
⁴ Center for Statistical Science, Peking University, Beijing 100871, China.

PMID: 36383167
PMCID: PMC9805570
DOI: 10.1093/bioinformatics/btac736

Clustering single-cell multi-omics data with MoClust

Musu Yuan et al. Bioinformatics. 2023.

. 2023 Jan 1;39(1):btac736.

doi: 10.1093/bioinformatics/btac736.

Authors

Musu Yuan¹, Liang Chen², Minghua Deng^{1

3

4}

Affiliations

¹ Center for Quantitative Biology, Peking University, Beijing 100871, China.
² Huawei Technologies Co., Ltd., Beijing 100080, China.
³ School of Mathematical Sciences, Peking University, Beijing 100871, China.
⁴ Center for Statistical Science, Peking University, Beijing 100871, China.

PMID: 36383167
PMCID: PMC9805570
DOI: 10.1093/bioinformatics/btac736

Abstract

Motivation: Single-cell multi-omics sequencing techniques have rapidly developed in the past few years. Clustering analysis with single-cell multi-omics data may give us novel perspectives to dissect cellular heterogeneity. However, multi-omics data have the properties of inherited large dimension, high sparsity and existence of doublets. Moreover, representations of different omics from even the same cell follow diverse distributions. Without proper distribution alignment techniques, clustering methods will encounter less separable clusters easily affected by less informative omics data.

Results: We developed MoClust, a novel joint clustering framework that can be applied to several types of single-cell multi-omics data. A selective automatic doublet detection module that can identify and filter out doublets is introduced in the pretraining stage to improve data quality. Omics-specific autoencoders are introduced to characterize the multi-omics data. A contrastive learning way of distribution alignment is adopted to adaptively fuse omics representations into an omics-invariant representation. This novel way of alignment boosts the compactness and separableness of clusters, while accurately weighting the contribution of each omics to the clustering object. Extensive experiments, over both simulated and real multi-omics datasets, demonstrated the powerful alignment, doublet detection and clustering ability features of MoClust.

Availability and implementation: An implementation of MoClust is available from https://doi.org/10.5281/zenodo.7306504.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

**Fig. 1.**
Linear fusion may hamper clustering. (a) Without alignment, less informative omics data make the fused clusters less separable. (b) Without alignment, less informative omics data make fused clusters less compact. (c) Without alignment, accurate clusters are attainable with each omics data, but the combination may be of poor quality. (d) Existing one-to-one cluster alignment fails when one of multiple omics is indistinguishable

**Fig. 2.**
Framework overview. (a) Existing multi-omics sequencing methods, grouped by different omics they can sequence. (b) Structure of the MoClust model: (i) preprocessed multi-omics data (Cao and Gao, 2022) are used as input, while outputs are estimated posterior parameters of omics-specific statistical models (Section 2.1). (ii) A fusion layer is introduced to linearly fuse the latent features of different omics data and is guided by a contrastive learning module (Section 2.2). (iii) A Cauchy–Schwarz divergence-based clustering module (Section 2.3) and a novel automatic doublet detection module (Section 2.4) are added after the fusion layer

**Fig. 3.**
MoClust integrates scRNA and protein data. (a and b) Performance of MoClust and competing methods by NMI and ARI over real CITE-seq datasets 10X10k and 10XInhouse. (c) CITE-seq simulation experiments. All simulated data were generated by Splatter, and the performance of each method was evaluated by ARI. (d) The change of fusion weights learned by MoClust when applying on different simulated datasets. (e) Two-dimensional visualization of latent features extracted by MoClust over the 10X10k dataset by the UMAP dimension reduction method. From left to right, fused features, RNA features and protein features are listed and colored by true cell types. (f) UMAP visualization of the fused feature applying MoClust over 10X10k dataset, colored by the expression of different marker proteins

**Fig. 4.**
(a and b) Performance of MoClust and competing methods by NMI and ARI over the real RNA+ATAC multi-omics datasets CellLine and 10XPBMC. (c) The change of fusion weights when clustering on different subgroups of cell types. (d) The estimated numbers of clusters by SIMLR with 10 different random seeds. And the ARI of MoClust using the estimated number of clusters. (e and f) UMAP visualization and Sankey plot of the clustering results performed by MoClust on the 10XInHouse dataset. Doublet detection module is employed in according experiments

See this image and copyright information in PMC

References

1. Angermueller C. et al. (2016) Parallel single-cell sequencing links transcriptional and epigenetic heterogeneity. Nat. Methods, 13, 229–232. - PMC - PubMed
1. Argelaguet R. et al. (2018) Multi-omics factor analysis—a framework for unsupervised integration of multi-omics data sets. Mol. Syst. Biol., 14, e8124. - PMC - PubMed
1. Argelaguet R. et al. (2020) MOFA+: a statistical framework for comprehensive integration of multi-modal single-cell data. Genome Biol., 21, 111. - PMC - PubMed
1. Bian S. et al. (2018) Single-cell multiomics sequencing and analyses of human colorectal cancer. Science, 362, 1060–1063. - PubMed
1. Cao K. et al. (2020) Unsupervised topological alignment for single-cell multi-omics integration. Bioinformatics, 36, i48–i56. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Clustering single-cell multi-omics data with MoClust

Affiliations

Clustering single-cell multi-omics data with MoClust

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Miscellaneous