. 2024 Sep 18;25(1):875.

doi: 10.1186/s12864-024-10728-x.

Deconvolution from bulk gene expression by leveraging sample-wise and gene-wise similarities and single-cell RNA-Seq data

Chenqi Wang^#¹, Yifan Lin^#¹, Shuchao Li¹, Jinting Guan^{2

3

4}

Affiliations

¹ Department of Automation, Xiamen University, Xiamen, China.
² Department of Automation, Xiamen University, Xiamen, China. jtguan@xmu.edu.cn.
³ Key Laboratory of System Control and Information Processing, Ministry of Education, Shanghai, China. jtguan@xmu.edu.cn.
⁴ National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, China. jtguan@xmu.edu.cn.

^# Contributed equally.

PMID: 39294558
PMCID: PMC11409548
DOI: 10.1186/s12864-024-10728-x

Deconvolution from bulk gene expression by leveraging sample-wise and gene-wise similarities and single-cell RNA-Seq data

Chenqi Wang et al. BMC Genomics. 2024.

. 2024 Sep 18;25(1):875.

doi: 10.1186/s12864-024-10728-x.

Authors

Chenqi Wang^#¹, Yifan Lin^#¹, Shuchao Li¹, Jinting Guan^{2

3

4}

Affiliations

¹ Department of Automation, Xiamen University, Xiamen, China.
² Department of Automation, Xiamen University, Xiamen, China. jtguan@xmu.edu.cn.
³ Key Laboratory of System Control and Information Processing, Ministry of Education, Shanghai, China. jtguan@xmu.edu.cn.
⁴ National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, China. jtguan@xmu.edu.cn.

^# Contributed equally.

PMID: 39294558
PMCID: PMC11409548
DOI: 10.1186/s12864-024-10728-x

Abstract

Background: The widely adopted bulk RNA-seq measures the gene expression average of cells, masking cell type heterogeneity, which confounds downstream analyses. Therefore, identifying the cellular composition and cell type-specific gene expression profiles (GEPs) facilitates the study of the underlying mechanisms of various biological processes. Although single-cell RNA-seq focuses on cell type heterogeneity in gene expression, it requires specialized and expensive resources and currently is not practical for a large number of samples or a routine clinical setting. Recently, computational deconvolution methodologies have been developed, while many of them only estimate cell type composition or cell type-specific GEPs by requiring the other as input. The development of more accurate deconvolution methods to infer cell type abundance and cell type-specific GEPs is still essential.

Results: We propose a new deconvolution algorithm, DSSC, which infers cell type-specific gene expression and cell type proportions of heterogeneous samples simultaneously by leveraging gene-gene and sample-sample similarities in bulk expression and single-cell RNA-seq data. Through comparisons with the other existing methods, we demonstrate that DSSC is effective in inferring both cell type proportions and cell type-specific GEPs across simulated pseudo-bulk data (including intra-dataset and inter-dataset simulations) and experimental bulk data (including mixture data and real experimental data). DSSC shows robustness to the change of marker gene number and sample size and also has cost and time efficiencies.

Conclusions: DSSC provides a practical and promising alternative to the experimental techniques to characterize cellular composition and heterogeneity in the gene expression of heterogeneous samples.

Keywords: Cell type abundance; Cell type-specific gene expression profile; Deconvolution; Similarity matrix; Single-cell RNA-seq data.

PubMed Disclaimer

Conflict of interest statement

The authors have declared no competing interests.

Figures

**Fig. 1**
Schematic representations of (A) DSSC algorithm and (B) deconvolution testing pipeline. We differentiated the cases keeping all three terms in the objective function (denoted as DSSC3) and keeping the first two terms about the sample-sample and gene-gene similarity matrices (denoted as DSSC2). DSSC2 does not need the referenced GEPs to infer the initial matrices C and P, and only needs the reference to determine the cell type label, i.e., the grey paths in the figure are not employed in DSSC2

**Fig. 2**
Intra-dataset deconvolution results. A PCC and RMSE between the inferred cell type proportion matrix (or GEP matrix) and the real one. The deconvolution methods are divided into three categories: deconvolution methods with GEPs as reference (denoted as blue), deconvolution methods with scRNA-seq data as reference (denoted as yellow), and other deconvolution methods (denoted as green). B PCC between the inferred cell type proportion matrix (or GEP matrix) and the real matrix, each point representing each responding result in Fig. 2A

**Fig. 3**
Inter-dataset deconvolution results including the cases of matching and unmatching cell types between the training and testing sets before deconvolution. A PCC and RMSE between the inferred cell type proportion matrix (or GEP matrix) and the real one in the case of matching cell types. B PCC between the inferred cell type proportion matrix (or GEP) and the real matrix, each point representing each corresponding result in Fig. 3A. C PCC and RMSE between the inferred cell type proportion matrix (or GEP matrix) and the real one in the case of unmatching cell types. D PCC between the inferred cell type proportion matrix (or GEP) and the real matrix, each point representing each corresponding result in Fig. 3C. Baron_Muraro denotes Baron data as the testing set and Muraro as the training set, and others have similar meanings

**Fig. 4**
Deconvolution results on mixture data. A PCC and RMSE between the inferred cell type proportion matrix and the real one and those between the inferred GEP matrix and the referenced one. The real GEP matrix is unknown, we used the referenced matrix instead to evaluate as the reference is from purified samples which makes the two GEP matrices similar. B The boxplot of PCC values, each point representing each corresponding result in Fig. 4A

**Fig. 5**
Deconvolution results on real experimental WholeBlood data for three different references. The single cell data-based methods were only tested using the references with the form of single cell data, i.e., 3’ PBMCs and 5’ PBMCs. (A) PCC and RMSE between the inferred cell type proportion matrix and the real one. (B) PCC between the inferred cell type proportion and the real one for each sample, each point representing a sample. (C) The averages of PCC of each sample across all three references and two references (3’ PBMCs and 5’ PBMCs), with each point representing a sample. (D) PCC values of each sample calculated based on the averaged cell type proportion matrix across three references and two references

**Fig. 6**
The influence of marker gene number and sample size on intra-dataset deconvolution results using Nestorowa data as an example. A Under the same number of samples, with the increase of the number of marker genes, the change of PCC of cell type proportion matrix. B Under the same number of marker genes, with the increase of sample size, the change of PCC of cell type proportion matrix

See this image and copyright information in PMC

Cited by

Revolutionizing Implantation Studies: Uterine-Specific Models and Advanced Technologies.
Li SY, DeMayo FJ. Li SY, et al. Biomolecules. 2025 Mar 20;15(3):450. doi: 10.3390/biom15030450. Biomolecules. 2025. PMID: 40149986 Free PMC article. Review.

References

1. Bennett DA, Schneider JA, Buchman AS, Mendes de Leon C, Bienias JL, Wilson RS. The rush memory and aging roject: study design and baseline characteristics of the study cohort. Neuroepidemiology. 2005;25(4):163–75. - PubMed
1. Chang K, Creighton CJ, Davis C, Donehower L, Drummond J, Wheeler D, Ally A, Balasundaram M, Birol I, Butterfield YSN, et al. The cancer genome atlas pan-cancer analysis project. Nat Genet. 2013;45(10):1113–20. - PMC - PubMed
1. Kuhn A, Kumar A, Beilina A, Dillman A, Cookson MR, Singleton AB. Cell population-specific expression analysis of human cerebellum. BMC Genomics. 2012;13(1): 610. - PMC - PubMed
1. Avila Cobos F, Alquicira-Hernandez J, Powell JE, Mestdagh P, De Preter K. Benchmarking of cell type deconvolution pipelines for transcriptomics data. Nat Commun. 2020;11(1):5650. - PMC - PubMed
1. Avila Cobos F, Vandesompele J, Mestdagh P, De Preter K. Computational deconvolution of transcriptomics data from mixed cell populations. Bioinformatics. 2018;34(11):1969–79. - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
- BioMed Central
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Deconvolution from bulk gene expression by leveraging sample-wise and gene-wise similarities and single-cell RNA-Seq data

Affiliations

Deconvolution from bulk gene expression by leveraging sample-wise and gene-wise similarities and single-cell RNA-Seq data

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full Text Sources