Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2022 Oct;20(5):814-835.
doi: 10.1016/j.gpb.2022.11.011. Epub 2022 Dec 14.

Application of Deep Learning on Single-cell RNA Sequencing Data Analysis: A Review

Affiliations
Review

Application of Deep Learning on Single-cell RNA Sequencing Data Analysis: A Review

Matthew Brendel et al. Genomics Proteomics Bioinformatics. 2022 Oct.

Abstract

Single-cell RNA sequencing (scRNA-seq) has become a routinely used technique to quantify the gene expression profile of thousands of single cells simultaneously. Analysis of scRNA-seq data plays an important role in the study of cell states and phenotypes, and has helped elucidate biological processes, such as those occurring during the development of complex organisms, and improved our understanding of disease states, such as cancer, diabetes, and coronavirus disease 2019 (COVID-19). Deep learning, a recent advance of artificial intelligence that has been used to address many problems involving large datasets, has also emerged as a promising tool for scRNA-seq data analysis, as it has a capacity to extract informative and compact features from noisy, heterogeneous, and high-dimensional scRNA-seq data to improve downstream analysis. The present review aims at surveying recently developed deep learning techniques in scRNA-seq data analysis, identifying key steps within the scRNA-seq data analysis pipeline that have been advanced by deep learning, and explaining the benefits of deep learning over more conventional analytic tools. Finally, we summarize the challenges in current deep learning approaches faced within scRNA-seq data and discuss potential directions for improvements in deep learning algorithms for scRNA-seq data analysis.

Keywords: Artificial intelligence; Deep learning; Deep neural network; Single-cell RNA sequencing; Single-cell sequencing.

PubMed Disclaimer

Conflict of interest statement

The authors have declared no competing interests.

Figures

Figure 1
Figure 1
Schematic of the common pipeline in scRNA-seq analysis A. scRNA-seq data collection. B. scRNA-seq data preprocessing: imputation and denoising. C. scRNA-seq data preprocessing: representation learning for dimensionality reduction. D. scRNA-seq data preprocessing: doublet removal. E. scRNA-seq data preprocessing: cell cycle variance removal. F. scRNA-seq data preprocessing: batch effect removal. G. Downstream analysis of scRNA-seq data: cell clustering. H. Downstream analysis of scRNA-seq data: cell type annotation. I. Downstream analysis of scRNA-seq data: trajectory inference. scRNA-seq, single-cell RNA sequencing; M, mitotic phase, i.e., nuclear division of the cell (including prophase, metaphase, anaphase, and telophase); S, synthesis phase for the replication of the chromosomes (belonging to interphase); G1, gap 1 phase, representing the beginning of interphase; G2, gap 2 phase, representing the end of interphase, prior to entering the mitotic phase.
Figure 2
Figure 2
Illustration of deep learning architectures that have been used in scRNA-seq analysis A. Basic design of a feed-forward neural network. B. A neural network is composed of “neurons” organized into layers. Each neuron combines a set of weights from the prior layer, and passes the weighted summed value through a non-linear activation function, such as sigmoid, rectifier (i.e., ReLU), and hyperbolic tangent, to produce a transformed output. C. Autoencoder, a special variant of the feed-forward neural network aiming at learning low-dimensional representations of data while preserving data information. D. DAE, a variant of autoencoder, which was developed to address overfitting problems of autoencoders. DAE forces the input data to be partially corrupted and tries to reconstruct the raw un-corrupted data. E. VAE, a variant of autoencoder, aiming at compressing input data into a constrained multivariate latent distribution space in the encoder, which is regular enough and can be used to generate new content in the decoder. F. GAE. Benefiting from the advanced deep learning architecture GNN, GAE has been developed and used in scRNA-seq analysis. The encoder of GAE considers both sample features (e.g., the gene expression profiles/counts of cells) and samples’ neighborhood information (e.g., topological structure of cellular interaction network) to produce low-dimensional representations while preserving topology in data. The decoder unpacks the low-dimensional representations to reconstruct the input network structure and/or sample features. ReLU, rectified linear unit; DAE, denoising autoencoder; VAE, variational autoencoder; GAE, graph autoencoder; GNN, graph neural network.

References

    1. Tang F., Barbacioru C., Wang Y., Nordman E., Lee C., Xu N., et al. mRNA-seq whole-transcriptome analysis of a single cell. Nat Methods. 2009;6:377–382. - PubMed
    1. Stuart T., Butler A., Hoffman P., Hafemeister C., Papalexi E., Mauck W.M., 3rd, et al. Comprehensive integration of single-cell data. Cell. 2019;177:1888–1902. - PMC - PubMed
    1. Wolf F.A., Angerer P., Theis F.J. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 2018;19:15. - PMC - PubMed
    1. Trapnell C., Cacchiarelli D., Grimsby J., Pokharel P., Li S., Morse M., et al. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat Biotechnol. 2014;32:381–386. - PMC - PubMed
    1. Amezquita R.A., Lun A.T.L., Becht E., Carey V.J., Carpp L.N., Geistlinger L., et al. Orchestrating single-cell analysis with Bioconductor. Nat Methods. 2020;17:137–145. - PMC - PubMed

Publication types

LinkOut - more resources