Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jan 28;13(1):1618.
doi: 10.1038/s41598-023-28952-y.

Improved downstream functional analysis of single-cell RNA-sequence data using DGAN

Affiliations

Improved downstream functional analysis of single-cell RNA-sequence data using DGAN

Diksha Pandey et al. Sci Rep. .

Abstract

The dramatic increase in the number of single-cell RNA-sequence (scRNA-seq) investigations is indeed an endorsement of the new-fangled proficiencies of next generation sequencing technologies that facilitate the accurate measurement of tens of thousands of RNA expression levels at the cellular resolution. Nevertheless, missing values of RNA amplification persist and remain as a significant computational challenge, as these data omission induce further noise in their respective cellular data and ultimately impede downstream functional analysis of scRNA-seq data. Consequently, it turns imperative to develop robust and efficient scRNA-seq data imputation methods for improved downstream functional analysis outcomes. To overcome this adversity, we have designed an imputation framework namely deep generative autoencoder network [DGAN]. In essence, DGAN is an evolved variational autoencoder designed to robustly impute data dropouts in scRNA-seq data manifested as a sparse gene expression matrix. DGAN principally reckons count distribution, besides data sparsity utilizing a gaussian model whereby, cell dependencies are capitalized to detect and exclude outlier cells via imputation. When tested on five publicly available scRNA-seq data, DGAN outperformed every single baseline method paralleled, with respect to downstream functional analysis including cell data visualization, clustering, classification and differential expression analysis. DGAN is executed in Python and is accessible at https://github.com/dikshap11/DGAN .

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
Schematic of deep generative autoencoder network (DGAN) downstream functional analysis pipeline for scRNA-seq data: The real input matrix ‘m’ is filtered for bad genes, normalize them according to library size and pruned by log transformed and scaling. The processed matrix is then fed into the DGAN model, which learns gene expression data depiction and reconstructs the imputed matrix. Finally, these imputed matrix facilitate extensive downstream analysis.
Figure 2
Figure 2
Violin plot depicting real and imputed data of Basile dataset attained from implementing all paralleled models in terms of log of coefficient variation computed for individual genes across the cells. The interquartile range is represented by the box, in addition the median is represented by horizontal line and whiskers demonstrate larger interquartile ranges.
Figure 3
Figure 3
Clustering analysis; (A) Representative visualization of clusters determined by t-SNE 2D visualization method for pre-imputed (Real) Karen scRNA-seq dataset. Imputed matrix via DeepImpute, DCA, GraphSCI, PBLR and DGAN. The cells colours are assigned according to their cell groups. (B) ARI, FMI, and SC signify clustering evaluation performance of scRNA-seq data of DeepImpute, DCA, GraphSCI PBLR and DGAN respectively.
Figure 4
Figure 4
(A) The performance graph is of Zeisel dataset where individual colour bars represent different real data and imputed data from DCA, GSCI, PBLR and DGAN models. (B) AUC-ROC measurements of various classification algorithms. AUC-ROC measurements of imputation built on different models and individual line colours representative of different algorithms.
Figure 5
Figure 5
Performance of DGAN on large-scale dataset, whisker plot of gene expression for log2FC and pval by differential expression analysis using PBMC data with different models.

Similar articles

Cited by

References

    1. Ng SB, et al. Exome sequencing identifies MLL2 mutations as a cause of Kabuki syndrome. Nat. Genet. 2010;42(9):790–793. doi: 10.1038/ng.646. - DOI - PMC - PubMed
    1. Svensson, V., Vento-Tormo, R. & Teichmann, S. A. Exponential scaling of single-cell RNA-seq in the last decade. [Online]. https://www.neb.com/faqs/2012/11/19/what-is-the-starting-material-i-need...-. - PubMed
    1. Tang F, et al. mRNA-Seq whole-transcriptome analysis of a single cell. Nat. Methods. 2009;6(5):377–382. doi: 10.1038/nmeth.1315. - DOI - PubMed
    1. Trapnell, C. & Liu, S.: Single-cell transcriptome sequencing: Recent advances and remaining challenges. In F1000Research, Vol. 5 (Faculty of 1000 Ltd, 2016). 10.12688/f1000research.7223.1. - PMC - PubMed
    1. Kumar RM, et al. Deconstructing transcriptional heterogeneity in pluripotent stem cells. Nature. 2014;516(729):56–61. doi: 10.1038/nature13920. - DOI - PMC - PubMed