Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Sep 23;25(6):bbae540.
doi: 10.1093/bib/bbae540.

Single-cell mosaic integration and cell state transfer with auto-scaling self-attention mechanism

Affiliations

Single-cell mosaic integration and cell state transfer with auto-scaling self-attention mechanism

Zhiwei Rong et al. Brief Bioinform. .

Abstract

The integration of data from multiple modalities generated by single-cell omics technologies is crucial for accurately identifying cell states. One challenge in comprehending multi-omics data resides in mosaic integration, in which different data modalities are profiled in different subsets of cells, as it requires simultaneous batch effect removal and modality alignment. Here, we develop Multi-omics Mosaic Auto-scaling Attention Variational Inference (mmAAVI), a scalable deep generative model for single-cell mosaic integration. Leveraging auto-scaling self-attention mechanisms, mmAAVI can map arbitrary combinations of omics to the common embedding space. If existing well-annotated cell states, the model can perform semisupervised learning to utilize existing these annotations. We validated the performance of mmAAVI and five other commonly used methods on four benchmark datasets, which vary in cell numbers, omics types, and missing patterns. mmAAVI consistently demonstrated its superiority. We also validated mmAAVI's ability for cell state knowledge transfer, achieving balanced accuracies of 0.82 and 0.97 with less 1% labeled cells between batches with completely different omics. The full package is available at https://github.com/luyiyun/mmAAVI.

Keywords: mosaic integration; self-attention; semi-supervised learning; single-cell; variational inference.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Schematic of mosaic integration and mmAAVI. A, graph illustration of mosaic integration data with three batches and three modalities. Batch effects may arise from differences in individuals, experimental conditions, or tissue sources. Potential modalities include chromatin accessibility (DNA level), gene expression (RNA level), and epitope (protein level), among others. Due to the combined impact of batch effects and modalities missingness, biological variations (such as cell types) are obscured by systematic errors, making direct integration analysis challenging. (b, c) The workflow for unsupervised (b) and semisupervised (c) analysis using mmAAVI. (d) Schematic of mmAAVI model. Multimodal data for each cell formula image are transformed into embeddings by modality-specific encoders and then fused into a global feature formula image, a low-dimensional representation of the cell state following mixture distribution parameterized by discrete formula image and continue formula image. A discriminator is used to harmonize the distribution of formula image across different batches formula image. Meanwhile, a guidance graph formula image with prior knowledge is transformed into feature embeddings formula image by a graph encoder. Next, modality-specific hybrid decoders map samples from the posterior distribution of formula image and formula image, along with the batch, formula image, to parameters of the distribution for each feature of existed modalities. The posterior mean of formula image can be used as input to clustering and visualization algorithms.
Figure 2
Figure 2
Benchmark results on the human PBMC dataset and mouse brain cortex dataset. (a) Layout of input data matrices in the human PBMC dataset. (b, c) UMAP visualization of cell embeddings learned by mmAAVI for the human PBMC dataset, where cells are colored by cell batches (b) and cell types (c). (d) The means of performance scores of mmAAVI and other methods on the PBMC dataset using the coarse annotation (four categories) for cell type labels. (e) Layout of input data matrices in the mouse brain cortex dataset. (f, g) UMAP visualization of cell embeddings learned by mmAAVI for the mouse brain cortex dataset, where cells are colored by cell batches (f) and cell types (g). (h) The means of performance scores of mmAAVI and other methods on the mouse brain cortex dataset.
Figure 3
Figure 3
Benchmark results on the human nephron dataset (Nephron) and adult mouse cortical neuron cell (Triple) dataset. (a) layout of input data matrices in the Nephron dataset. (b–d) UMAP visualization of cell embeddings learned by mmAAVI for the Nephron dataset, where cells are colored by samples (b), omics (c), and cell types (d). (e) The means of performance scores of mmAAVI and other methods on the Nephron dataset. (f–i) UMAP visualization of cell embeddings learned by mmAAVI for the Triple dataset, where cells are colored by cell batches (f) and cell types (g–i) for each omics layer. (k) Layout of input data matrices in the Triple dataset. (j–l) The means of performance scores of mmAAVI and other methods on the Triple dataset. (m) The total score of all methods on the Triple dataset.
Figure 4
Figure 4
Results of semisupervised learning on the human nephron dataset (Nephron) and adult mouse cortical neuron cell (Triple) dataset. (a, b) The test balanced accuracy (bACC) of mmAAVI and benchmark approaches on each batch of the Nephron (a) and Triple (b) datasets, where "global" represents the bACC on all test cells, and "average" represents the average values across bACCs of all batches. (c, d) UMAP visualization of cell embeddings learned by semisupervised version of mmAAVI for the Nephron (c) and Triple (d) datasets, where cells are colored by annotation of “seed” cells, prediction of semisupervised version of mmAAVI, and true cell types. (e, f) The performance scores of semisupervised version and unsupervised version of mmAAVI on the Nephron (e) and Triple (f) datasets.

References

    1. Klein AM, Mazutis L, Akartuna I. et al. . Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells. Cell 2015;161:1187–201. 10.1016/j.cell.2015.04.044. - DOI - PMC - PubMed
    1. Buenrostro JD, Giresi PG, Zaba LC. et al. . Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nat Methods 2013;10:1213–8. 10.1038/nmeth.2688. - DOI - PMC - PubMed
    1. Ma S, Zhang B, LaFave LM. et al. . Chromatin potential identified by shared single-cell profiling of RNA and chromatin. Cell 2020;183:1103–1116.e20. 10.1016/j.cell.2020.09.056. - DOI - PMC - PubMed
    1. Stoeckius M, Hafemeister C, Stephenson W. et al. . Simultaneous epitope and transcriptome measurement in single cells. Nat Methods 2017;14:865–8. 10.1038/nmeth.4380. - DOI - PMC - PubMed
    1. Clark SJ, Argelaguet R, Kapourani CA. et al. . scNMT-seq enables joint profiling of chromatin accessibility DNA methylation and transcription in single cells. Nat Commun 2018;9:781. 10.1038/s41467-018-03149-4. - DOI - PMC - PubMed

LinkOut - more resources