Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jul 11;24(1):163.
doi: 10.1186/s13059-023-02989-8.

CMOT: Cross-Modality Optimal Transport for multimodal inference

Affiliations

CMOT: Cross-Modality Optimal Transport for multimodal inference

Sayali Anil Alatkar et al. Genome Biol. .

Abstract

Multimodal measurements of single-cell sequencing technologies facilitate a comprehensive understanding of specific cellular and molecular mechanisms. However, simultaneous profiling of multiple modalities of single cells is challenging, and data integration remains elusive due to missing modalities and cell-cell correspondences. To address this, we developed a computational approach, Cross-Modality Optimal Transport (CMOT), which aligns cells within available multi-modal data (source) onto a common latent space and infers missing modalities for cells from another modality (target) of mapped source cells. CMOT outperforms existing methods in various applications from developing brain, cancers to immunology, and provides biological interpretations improving cell-type or cancer classifications.

Keywords: Cross-modal inference; Multimodal data alignment; Optimal transport; Probabilistic coupling; Single-cell multi-modality; Weighted nearest neighbor.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Cross-Modality Optimal Transport (CMOT): CMOT is a computational approach to infer missing modalities for existing single-cell modalities. It has three main steps: A Alignment (optional), B Optimal Transport, and (C) k-Nearest Neighbors inference. CMOT inputs two multi-modalities X and Y (source), where the cells in X and Y need not be completely corresponding. The cell-to-cell correspondence information between X and Y can be specified through p. CMOT aligns X and Y using non-linear manifold alignment (NMA) onto a common low-dimensional latent space if cells in XY do not have complete correspondence. Then, CMOT uses optimal transport (OT) to map the cells in source Y to the cells in target Y^, where Y and Y^ share modalities. CMOT minimizes the cost of transportation by finding the Wasserstein distance between cells in Y and Y^ which is further regularized by prior knowledge or induced cell clusters and entropy of transport. Finally, CMOT infers the missing modality X^ for cells in Y^ using k-Nearest Neighbors (kNN). It calculates a weighted average of the k-nearest mapped cells in Y for every cell in Y^, using their values from X, and infers X^
Fig. 2
Fig. 2
Single-cell gene expression inference from chromatin accessibility in developing human brain. A Cell-wise Pearson correlation (y-axis) of inferred and measured gene expression by different methods (x-axis): CMOT (p = 25%, 50%, 75%, 100%), Seurat, MOFA + (Additional File 1: Table S1-S4). See Additional File 1: Fig. S1 A and Additional File 1: Supplementary Methods for additional benchmarking. B Gene-wise correlation between the inferred and measured gene expression, comparing CMOT (y-axis) with MOFA + and Seurat (x-axis). Dots: Genes; Numbers: numbers of genes with improved inference by comparing methods. P-values are from one-sided Wilcoxon rank-sum tests. C Gene-wise AUPRC of cell type marker gene for classifying the cell type. Dashline: baseline = 0.1 (see the “Methods” section). D The measured (x-axis) versus inferred normalized expression (y-axis) of genes (dots) for one cell across four cell types: nIPC/GluN1(first), E.C./Peric. (second), IN (third), R.G. (last). E Heatmap showing different enriched terms ranked by -log10(adj. p-value of enrichment) values for the top 100 highly predictive genes within each cell type (see the “Methods” section). r is the Pearson correlation coefficient. p is the correlation p-value
Fig. 3
Fig. 3
Inferring protein expression from gene expression in single-cell peripheral blood mononuclear cells. A Cell-wise Pearson correlation (y-axis) of inferred and measured protein expression by different methods (x-axis): CMOT (p = 25%, 50%, 75%, 100%), Seurat, MOFA + , TotalVI (Additional File 1: Tables S11-S12). B The measured (x-axis) versus inferred normalized expression (y-axis) of 14 proteins (dots) for two select cells. C Pearson correlations of inferred and measured expression (y-axis) of individual proteins (x-axis) by CMOT (p = 100%, 75%), Seurat, MOFA + , TotalVI (Additional File 1: Table S13). D UMAPs of inferred and measured expressions for three proteins: TIGIT (r = 0.45; p = 6.18e − 197) (top), CD16 (r = 0.75; p = 0) (middle), CD8a (r = 0.55; p = 3.11e − 312) (bottom) (Additional File 1: Table S13). The intensity represents the protein expression level. r is the Pearson correlation coefficient. p is the correlation p-value
Fig. 4
Fig. 4
Inference of gene expression for drug-treated A549 lung cancer cells using chromatin accessibility. A Cell-wise Pearson correlation (y-axis) of inferred and measured gene expression by different methods (x-axis): CMOT (p = 25%, 50%, 75%, 100%), Seurat, MOFA+ (Additional File 1: Tables S17-S20). See Additional File 1: Fig. S4 A and Supplementary Methods for additional benchmarking. B Gene-wise correlation between the inferred (y-axis) and measured (x-axis) expression, comparing CMOT with MOFA + and Seurat. Dots: Genes; Numbers: Gene numbers above and below the dotted line. P-values are calculated by a one-sided Wilcoxon rank-sum test (Additional File 1: Table S21). C CMOT inferred normalized gene expression trend (y-axis) across treatment hours (x-axis). Key genes: PER1 and BIRC3 [–25] are markers for glucocorticoid receptor (GR) activation seen later in treatment (3 h). ZSWIM6 [26] is a key gene of early events of DEX treatment (0 h, 1 h) (see Additional File 2 for top 100 highly predictive genes). D Enriched terms associated with CMOT inferred gene expression using 435 genes with a higher gene-wise Pearson correlation compared to MOFA+ ’s 748 genes (B, Additional File 1: Fig. S6B). E The measured (x-axis) versus inferred normalized expression (y-axis) of genes (dots) for three select cells. r is the Pearson correlation coefficient. p is the correlation p-value
Fig. 5
Fig. 5
Cross-modality inference between gene expression and chromatin accessibility can distinguish cancer types. A Cell-wise Pearson correlation (y-axis) of inferred and measured gene expression by different methods (x-axis): CMOT (p = 25%, 50%, 75%, 100%), Seurat, MOFA + (Additional File 1: Tables S22-S25). B Silhouette score (x-axis) across measured and inferred gene expressions (x-axis), and measured chromatin peaks (Additional File 1: Table S26). C PCA of inferred gene expression. D Gene-wise correlation between the inferred and measured expression, comparing CMOT (y-axis) with MOFA + and Seurat (x-axis). Dots: Genes; Numbers: Gene numbers above and below the dotted line. E Peak-wise AUROC, comparing CMOT (y-axis) with MOFA + and Seurat (x-axis). Dots: Peaks; Numbers: Peak numbers above and below the dotted line. P-values are calculated by a one-sided Wilcoxon rank-sum test

References

    1. Trevino AE, et al. Chromatin and gene-regulatory dynamics of the developing human cerebral cortex at single-cell resolution. Cell. 2021;184(19):5053–5069.e23. doi: 10.1016/j.cell.2021.07.039. - DOI - PubMed
    1. Cao J, et al. Joint profiling of chromatin accessibility and gene expression in thousands of single cells. Science. 2018;361(6409):1380–1385. doi: 10.1126/science.aau0730. - DOI - PMC - PubMed
    1. Liu L, et al. Deconvolution of single-cell multi-omics layers reveals regulatory heterogeneity. Nat Commun. 2019;10(1):470. doi: 10.1038/s41467-018-08205-7. - DOI - PMC - PubMed
    1. Stoeckius M, et al. Simultaneous epitope and transcriptome measurement in single cells. Nat Methods. 2017;14(9):865–868. doi: 10.1038/nmeth.4380. - DOI - PMC - PubMed
    1. Gayoso A, et al. Joint probabilistic modeling of single-cell multi-omic data with totalVI. Nat Methods. 2021;18(3):272–282. doi: 10.1038/s41592-020-01050-x. - DOI - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources