Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Jul 22;6(30):eaba2619.
doi: 10.1126/sciadv.aba2619. eCollection 2020 Jul.

Deep learning-based cell composition analysis from tissue expression profiles

Affiliations

Deep learning-based cell composition analysis from tissue expression profiles

Kevin Menden et al. Sci Adv. .

Abstract

We present Scaden, a deep neural network for cell deconvolution that uses gene expression information to infer the cellular composition of tissues. Scaden is trained on single-cell RNA sequencing (RNA-seq) data to engineer discriminative features that confer robustness to bias and noise, making complex data preprocessing and feature selection unnecessary. We demonstrate that Scaden outperforms existing deconvolution algorithms in both precision and robustness. A single trained network reliably deconvolves bulk RNA-seq and microarray, human and mouse tissue expression data and leverages the combined information of multiple datasets. Because of this stability and flexibility, we surmise that deep learning will become an algorithmic mainstay for cell deconvolution of various data types. Scaden's software package and web application are easy to use on new as well as diverse existing expression datasets available in public resources, deepening the molecular and cellular understanding of developmental and disease processes.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1. Overview of training data generation and cell type deconvolution with Scaden.
(A) Artificial bulk samples are generated by subsampling random cells from an scRNA-seq dataset and merging their expression profiles. (B) Model training and parameter optimization on simulated tissue RNA-seq data by comparing cell fraction predictions to ground-truth cell composition. (C) Cell deconvolution of real tissue RNA-seq data using Scaden.
Fig. 2
Fig. 2. Performance comparison of deconvolution algorithms on simulated tissue data.
(A) Boxplots of the cell type prediction CCC and RMSE for four simulated PBMC datasets. Tables S14 and S16 contain information on the five (six for CS) cell types used. (B) Scatterplots for four pancreas cell types of ground-truth (x axis) and predicted values (y axis) for Scaden, CSx, and MuSiC on artificial pancreas data (20). Numbers inside the plotting area and in parenthesis signify CCC values.
Fig. 3
Fig. 3. Comparison of deconvolution algorithms on PBMC tissue RNA-seq data.
(A) Per–cell type scatterplots of ground-truth (x axis) and predicted values (y axis) for Scaden, CS, CSx, and MuSiC on real PBMC1 and PBMC2 cell fractions. Numbers inside the plotting area signify CCC values. For Scaden, the CCC using only scRNA-seq training data is shown in parenthesis, and the CCC using mixed scRNA-seq and RNA-seq training data is shown without parentheses. (B) Boxplots of RMSE values for real PBMC1 and PBMC2 data. (C) CCC values for real PBMC1 and PBMC2 data.
Fig. 4
Fig. 4. Deconvolution performance comparison on brain tissue RNA-seq data.
(A) Prediction of human brain cell fractions of the ROSMAP dataset using the Darmanis dataset as a reference: scatterplots of ground-truth (x axis) and predicted values (y axis) for Scaden, CSx, and MuSiC of data. CCC values are shown as inserts. (B) Per–cell type CCC values for ROSMAP using the Darmanis data as a reference. (C) Neuronal content determined by Scaden trained on mouse brain data and evaluated on the Braak stage of the ROSMAP study.

References

    1. Hrdlickova R., Toloue M., Tian B., RNA-Seq methods for transcriptome analysis. Wiley Interdiscip. Rev. RNA 8, e1364 (2017). - PMC - PubMed
    1. Egeblad M., Nakasone E. S., Werb Z., Tumors as organs: Complex tissues that interface with the entire organism. Dev. Cell 18, 884–901 (2010). - PMC - PubMed
    1. Kuhn A., Thu D., Waldvogel H. J., Faull R. L. M., Luthi-Carter R., Population-specific expression analysis (PSEA) reveals molecular changes in diseased brain. Nat. Methods 8, 945–947 (2011). - PubMed
    1. Avila Cobos F., Vandesompele J., Mestdagh P., De Preter K., Computational deconvolution of transcriptomics data from mixed cell populations. Bioinformatics 34, 1969–1979 (2018). - PubMed
    1. Mohammadi S., Zuckerman N., Goldsmith A., Grama A., A critical survey of deconvolution methods for separating cell types in complex tissues. Proc. IEEE 105, 340–366 (2017).

Publication types

LinkOut - more resources