Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Sep 1;3(9):100577.
doi: 10.1016/j.patter.2022.100577. eCollection 2022 Sep 9.

Single-cell multi-modal GAN reveals spatial patterns in single-cell data from triple-negative breast cancer

Affiliations

Single-cell multi-modal GAN reveals spatial patterns in single-cell data from triple-negative breast cancer

Matthew Amodio et al. Patterns (N Y). .

Abstract

Exciting advances in technologies to measure biological systems are currently at the forefront of research. The ability to gather data along an increasing number of omic dimensions has created a need for tools to analyze all of this information together, rather than siloing each technology into separate analysis pipelines. To advance this goal, we introduce a framework called the single-cell multi-modal generative adversarial network (scMMGAN) that integrates data from multiple modalities into a unified representation in the ambient data space for downstream analysis using a combination of adversarial learning and data geometry techniques. The framework's key improvement is an additional diffusion geometry loss with a new kernel that constrains the otherwise over-parameterized GAN. We demonstrate scMMGAN's ability to produce more meaningful alignments than alternative methods on a wide variety of data modalities and that its output can be used to draw conclusions from real-world biological experimental data.

Keywords: Deep Learning; GANs; generative adversarial networks; manifold learning; scRNAseq.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
scMMGAN architecture and the correspondence loss (A) The scMMGAN architecture mapping between multiple domains, each consisting of a pair of generators and discriminator. (B) In addition to the discriminator loss, there are two additional losses within each domain. (C) Hypothetical demonstration of the data geometry guiding alignment through the correspondence loss. In the depicted space, data in the two domains have been shifted and rotated, but the intrinsic data geometry is preserved with the values of the diffusion eigenvectors. (D) Hypothetical illustration of a bad mapping that is invertible (has low reconstruction loss) but does not align analogous representations (has high correspondence loss) and a good mapping that is both invertible and aligns analogous representations. In the situation where minimally changing the value of genes is preferred, the mapping on the left unnecessarily changes the value of the gene on the x axis.
Figure 2
Figure 2
Results comparison from the DBIT-seq experiment On the DBIT-seq data, shown are corresponding proteomic and transcriptomic expression for the gene shown. The x axis and y axis plotted are the measured spatial coordinates taken directly from the data. The ground-truth transcriptomic values are plotted alongside the generated proteomic values for each model, where we see scMMGAN best model the data.
Figure 3
Figure 3
Design of the uncertainty quantification experiment (A) A depiction of how scMMGAN can be used to quantify how much uncertainty is associated with the mapping to each gene. A particular cell is mapped from Domain i to Domain j along with various different noise samples. The mapped values of Gene A change significantly with the noise, while the mapped values of Gene B change little for this cell. We interpret this as a quantification of how much information there is about each gene in Domain i. (B) The genes identified by scMMGAN to have the most uncertainty associated with the mapping, and thus have the least common information with the proteomic measurements in this dataset.
Figure 4
Figure 4
Results comparison from the ATAC-seq/RNA-seq experiment Ground-truth values for held-out cells and the predictions for each model on the experiment mapping between ATAC and RNA sequencing. scMMGAN’s output matches the ground truth most accurately compared with the other models, which inverted populations through the mapping. Coordinates shown are from the first two principal component analysis (PCA) dimensions.
Figure 5
Figure 5
Analysis of scMMGAN alignment and clusters on the triple-negative breast cancer dataset (A) Plotted are the PCA coordinates of the gene expression values from the two distributions. In the raw data, the spatial RNA-seq and scRNA-seq are not directly comparable, as they are entirely separable. After mapping with scMMGAN, they are aligned and comparable with downstream analysis. (B) Mapping spatial RNA-seq to scRNA-seq, clustering the generated scRNA-seq values, and then plotting the cluster by the measured spatial coordinate on the x axis an y axis. (C) Generated spatial RNA-seq data from scRNA-seq, including generated spatial coordinates. Same coordinates as previous plot. (D) All generated clusters mapped to the spatial RNA-seq space. Same coordinates as previous plots.
Figure 6
Figure 6
Generated scMMGAN expression value results plotted on the spatial coordinates The x axis and y axis plotted are the raw measured spatial coordinates from the spatial RNA-seq. The color is expression value, where we compare the original spatial RNA-seq of a gene with each generated scRNA-seq value of that gene for each method, showing scMMGAN best aligns the original and generated values.

References

    1. Ozsolak F., Milos P.M. Rna sequencing: advances, challenges and opportunities. Nat. Rev. Genet. 2011;12:87–98. - PMC - PubMed
    1. Baek S., Lee I. Single-cell atac sequencing analysis: from data preprocessing to hypothesis generation. Comput. Struct. Biotechnol. J. 2020;18:1429–1439. - PMC - PubMed
    1. Forcato M., Nicoletti C., Pal K., Livi C.M., Ferrari F., Bicciato S. Comparison of computational methods for hi-c data analysis. Nat. Methods. 2017;14:679–685. - PMC - PubMed
    1. Park P.J. Chip–seq: advantages and challenges of a maturing technology. Nat. Rev. Genet. 2009;10:669–680. - PMC - PubMed
    1. Stoeckius M., Peter S. 2017. Cite-seq.

LinkOut - more resources