Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Mar 29;40(4):btae169.
doi: 10.1093/bioinformatics/btae169.

Clustering single-cell multi-omics data via graph regularized multi-view ensemble learning

Affiliations

Clustering single-cell multi-omics data via graph regularized multi-view ensemble learning

Fuqun Chen et al. Bioinformatics. .

Abstract

Motivation: Single-cell clustering plays a crucial role in distinguishing between cell types, facilitating the analysis of cell heterogeneity mechanisms. While many existing clustering methods rely solely on gene expression data obtained from single-cell RNA sequencing techniques to identify cell clusters, the information contained in mono-omic data is often limited, leading to suboptimal clustering performance. The emergence of single-cell multi-omics sequencing technologies enables the integration of multiple omics data for identifying cell clusters, but how to integrate different omics data effectively remains challenging. In addition, designing a clustering method that performs well across various types of multi-omics data poses a persistent challenge due to the data's inherent characteristics.

Results: In this paper, we propose a graph-regularized multi-view ensemble clustering (GRMEC-SC) model for single-cell clustering. Our proposed approach can adaptively integrate multiple omics data and leverage insights from multiple base clustering results. We extensively evaluate our method on five multi-omics datasets through a series of rigorous experiments. The results of these experiments demonstrate that our GRMEC-SC model achieves competitive performance across diverse multi-omics datasets with varying characteristics.

Availability and implementation: Implementation of GRMEC-SC, along with examples, can be found on the GitHub repository: https://github.com/polarisChen/GRMEC-SC.

PubMed Disclaimer

Conflict of interest statement

None declared.

Figures

Figure 1.
Figure 1.
Schematic overview of the proposed GRMEC-SC model. Initially, multiple base clustering methods are applied to each omic data separately to produce base clustering results. Then the consensus low-dimensional representation matrix W and the consensus co-cluster affinity matrix S are learned from multi-omics data and multiple base clustering results, respectively. Finally, they are incorporated into two graph regularization terms to guide the learning of cluster indicator matrix H.
Figure 2.
Figure 2.
Sensitivity analysis of the hyper-parameters on the Inhouse dataset. (a) ARI on Inhouse dataset. (b) NMI on Inhouse dataset.
Figure 3.
Figure 3.
The clustering performance of GRMEC-SC and various base clustering methods is evaluated on different datasets. Each base clustering method is applied individually to each omic data to assess its performance. (a) Results on CITE-seq datasets. (b) Results on multi-omics datasets that include both scRNA-seq and scATAC-seq data.
Figure 4.
Figure 4.
Our identified marker genes on the 10X-pbmc-3k dataset. (a) Dot diagram illustrating the average expression of the top-4 differentially expressed genes within each predicted cluster. (b) Violin plots showcasing the identified marker genes across different clusters.

Similar articles

Cited by

References

    1. Altman NS. An introduction to kernel and nearest-neighbor nonparametric regression. Am Stat 1992;46:175–85.
    1. Andrews TS, Hemberg M.. M3drop: dropout-based feature selection for scrnaseq. Bioinformatics 2019;35:2865–7. - PMC - PubMed
    1. Argelaguet R, Arnol D, Bredikhin D. et al. Mofa+: a statistical framework for comprehensive integration of multi-modal single-cell data. Genome Biol 2020;21:111–7. - PMC - PubMed
    1. Bellman R. Dynamic programming. Science 1966;153:34–7. - PubMed
    1. Brawand D, Soumillon M, Necsulea A. et al. The evolution of gene expression levels in mammalian organs. Nature 2011;478:343–8. - PubMed

Publication types