. 2024 Feb;11(7):e2306329.

doi: 10.1002/advs.202306329. Epub 2023 Dec 10.

Highly Accurate Estimation of Cell Type Abundance in Bulk Tissues Based on Single-Cell Reference and Domain Adaptive Matching

Xinyang Guo¹, Zhaoyang Huang¹, Fen Ju², Chenguang Zhao², Liang Yu¹

Affiliations

¹ School of Computer Science and Technology, Xidian University, Xi'an, 710071, China.
² Department of Rehabilitation Medicine, Xijing Hospital, Fourth Military Medical University, Xi'an, 710032, China.

PMID: 38072669
PMCID: PMC10870031
DOI: 10.1002/advs.202306329

Highly Accurate Estimation of Cell Type Abundance in Bulk Tissues Based on Single-Cell Reference and Domain Adaptive Matching

Xinyang Guo et al. Adv Sci (Weinh). 2024 Feb.

. 2024 Feb;11(7):e2306329.

doi: 10.1002/advs.202306329. Epub 2023 Dec 10.

Authors

Xinyang Guo¹, Zhaoyang Huang¹, Fen Ju², Chenguang Zhao², Liang Yu¹

Affiliations

¹ School of Computer Science and Technology, Xidian University, Xi'an, 710071, China.
² Department of Rehabilitation Medicine, Xijing Hospital, Fourth Military Medical University, Xi'an, 710032, China.

PMID: 38072669
PMCID: PMC10870031
DOI: 10.1002/advs.202306329

Abstract

Accurately identifies the cellular composition of complex tissues, which is critical for understanding disease pathogenesis, early diagnosis, and prevention. However, current methods for deconvoluting bulk RNA sequencing (RNA-seq) typically rely on matched single-cell RNA sequencing (scRNA-seq) as a reference, which can be limiting due to differences in sequencing distribution and the potential for invalid information from single-cell references. Hence, a novel computational method named SCROAM is introduced to address these challenges. SCROAM transforms scRNA-seq and bulk RNA-seq into a shared feature space, effectively eliminating distributional differences in the latent space. Subsequently, cell-type-specific expression matrices are generated from the scRNA-seq data, facilitating the precise identification of cell types within bulk tissues. The performance of SCROAM is assessed through benchmarking against simulated and real datasets, demonstrating its accuracy and robustness. To further validate SCROAM's performance, single-cell and bulk RNA-seq experiments are conducted on mouse spinal cord tissue, with SCROAM applied to identify cell types in bulk tissue. Results indicate that SCROAM is a highly effective tool for identifying similar cell types. An integrated analysis of liver cancer and primary glioblastoma is then performed. Overall, this research offers a novel perspective for delivering precise insights into disease pathogenesis and potential therapeutic strategies.

Keywords: deconvolution; tissue heterogeneity; transcriptomics; transfer learning.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

**Figure 1**
Overview of SCROAM. a) The deconvolution model that uses a reference requires two input datasets: bulk RNA‐seq count and a reference containing counts of scRNA‐seq reads. Additionally, the single‐cell transcriptome data must label the cell type to be quantified. b) SCROAM learns gene‐specific transformations of bulk data by utilizing the reference sequences observed in single‐cell data. This allows us to account for potential technical bias between sequencing technologies used in single‐cell and bulk RNA‐seq data. c) SCROAM begins with scRNA‐seq data and classifies the cells into different cell types, which were represented by different colors in the analysis. By calculating gene specificity in a given cell type, an expression matrix reflecting cell type specificity was constructed. d) SCROAM employs single‐cell reference data to estimate the cell type ratio in transformed bulk data.

**Figure 2**
The figure displays the error distribution for each method in the pseudobulk experiment utilizing data from the Tabula Muris Senis dataset. The experiment was conducted on eight distinct organs, and the errors were computed as the mean L1 error across various cell types in each organ. a,c) show the results for the Smart‐seq2 reference and 10x Chromium pseudobulk. b,d) show the results for the 10x Chromium pseudobulk and Smart‐seq2 pseudobulk. In the violin plots, the distribution of errors for each evaluated method is presented, with white dots indicating the mean error. The grid plots use colors to indicate the difference between the mean errors of the different methods in that organ, with darker reds indicating relatively poorer performance. These visualizations allow for easy comparison of the performance of different methods across different organs and experimental conditions.

**Figure 3**
depicts the results of the Large Intestine organ dataset using Smart‐seq2 as a reference. a) The comparison of results before and after data transformation is shown, indicating that the data transformed by KMM resulted in lower error rates. b) The distance between the raw bulk data and the single‐cell reference is compared with the distance between the transformed data and the single‐cell reference. The value in each box represents the JSD distance between the sample and the cell. The results show that the distance between the transformed data and single‐cell reference is significantly smaller than that of the raw bulk data, highlighting the effectiveness of the KMM data transformation step. c) The deconvolution analysis results for each sample are presented, demonstrating that the results using transformed data are generally higher than those without transformation.

**Figure 4**
The evaluation of each applicable method using data from Dong,^[ ²¹ ^] which includes known cell type proportions. a) shows the single‐cell clustering results and t‐SNE visualization of the three cell types in the dataset, MDA‐MB‐468, MCF‐7, and normal fibroblasts, with a ratio of ≈6:3:1. b) The benchmark of deconvolution results for bulk RNA‐seq samples generated by different methods is presented. The proportion estimated by SCROAM has the lowest Mean L1 errors (2.7) to the ground truth, indicating superior accuracy in estimating cell type proportions.

**Figure 5**
Each applicable method was evaluated using data from the neural stem region of the mouse spinal cord. a) Following single‐cell clustering, t‐SNE visualization was generated, revealing six clusters: d_qNSCs, p_qNSCs_early, p_qNSCs_late, aNSCs, TAPs, NB. b) In the benchmarking of deconvolution results on bulk samples generated by different methods, SCROAM was observed to provide the most accurate estimation of the actual biological proportions among all the benchmarked methods.

**Figure 6**
Effect of cell ratio on patient survival. a) Effect of LSEC cell fraction on overall survival (OS), with patients exhibiting high levels of LSEC cells having longer survival times. b) The effect of cholangiocyte fraction on OS, with patients having a high proportion of cholangiocytes associated with a lower OS.

**Figure 7**
Relationship between cell status and prognosis of non‐malignant cells in various tumor types from the TCGA cohort. a) Violin plot visualizing the distribution of cell type fractions in each tumor type. The median is represented by a white dot and the upper and lower quartiles are represented by bars. b,c) The association between oligodendrocyte(b) and pericyte(c) infiltration with survival in GBM using Kaplan–Meier plots.

See this image and copyright information in PMC

Cited by

FusionEncoder: identification of intrinsically disordered regions based on multi-feature fusion.
Liu S, Chen S, Bai T, Liu B. Liu S, et al. Bioinformatics. 2025 Jul 1;41(7):btaf362. doi: 10.1093/bioinformatics/btaf362. Bioinformatics. 2025. PMID: 40577786 Free PMC article.
Identification of DNA N6-methyladenine modifications in the rice genome with a fine-tuned large language model.
Zhang Y, Chen H, Xiang S, Lv Z. Zhang Y, et al. Front Plant Sci. 2025 Jun 25;16:1626539. doi: 10.3389/fpls.2025.1626539. eCollection 2025. Front Plant Sci. 2025. PMID: 40636005 Free PMC article.
NeXtMD: a new generation of machine learning and deep learning stacked hybrid framework for accurate identification of anti-inflammatory peptides.
Xie C, Wei Y, Luo X, Yang H, Lai H, Dao F, Feng J, Lv H. Xie C, et al. BMC Biol. 2025 Jul 15;23(1):212. doi: 10.1186/s12915-025-02314-8. BMC Biol. 2025. PMID: 40660190 Free PMC article.
msBERT-Promoter: a multi-scale ensemble predictor based on BERT pre-trained model for the two-stage prediction of DNA promoters and their strengths.
Li Y, Wei X, Yang Q, Xiong A, Li X, Zou Q, Cui F, Zhang Z. Li Y, et al. BMC Biol. 2024 May 30;22(1):126. doi: 10.1186/s12915-024-01923-z. BMC Biol. 2024. PMID: 38816885 Free PMC article.
scRSSL: Residual semi-supervised learning with deep generative models to automatically identify cell types.
Gao Y, Duan H, Meng F, Zhang C, Li X, Li F. Gao Y, et al. IET Syst Biol. 2025 Jan-Dec;19(1):e12107. doi: 10.1049/syb2.12107. Epub 2025 Apr 22. IET Syst Biol. 2025. PMID: 40261690 Free PMC article.

See all "Cited by" articles

References

1. a) Carithers L. J., Moore H. M., The genotype‐tissue expression (GTEx) project, Vol. 13, Mary Ann Liebert, New Rochelle, NY, USA: 2015. - PMC - PubMed
2. b) Tomczak K., Czerwinska P., Wiznerowicz M., Contemp. Oncol. (Pozn) 2015, 19, A68. - PMC - PubMed
1. Saliba A.‐E., Westermann A. J., Gorski S. A., Vogel J., Nucleic Acids Res. 2014, 42, 8845. - PMC - PubMed
1. a) Denisenko E., Guo B. B., Jones M., Hou R., De Kock L., Lassmann T., Poppe D., Clément O., Simmons R. K., Lister R., Forrest A. R. R., Genome biol. 2020, 21, 130; - PMC - PubMed
2. b) Kuksin M., Morel D., Aglave M., Danlos F.‐X., Marabelle A., Zinovyev A., Gautheret D., Verlingue L., Eur. J. Cancer 2021, 149, 193. - PubMed
1. a) Vallania F., Tam A., Lofgren S., Schaffert S., Azad T. D., Bongen E., Haynes W., Alsup M., Alonso M., Davis M., Engleman E., Khatri P., Nat. Commun. 2018, 9, 4735; - PMC - PubMed
2. b) Avila Cobos F., Vandesompele J., Mestdagh P., De Preter K., Bioinformatics 2018, 34, 1969; - PubMed
3. c) Sturm G., Finotello F., Petitprez F., Zhang J. D., Baumbach J., Fridman W. H., List M., Aneichyk T., Bioinformatics 2019, 35, i436. - PMC - PubMed
1. Jin H., Liu Z., Genome biol. 2021, 22, 102. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
- Europe PubMed Central
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Highly Accurate Estimation of Cell Type Abundance in Bulk Tissues Based on Single-Cell Reference and Domain Adaptive Matching

Affiliations

Highly Accurate Estimation of Cell Type Abundance in Bulk Tissues Based on Single-Cell Reference and Domain Adaptive Matching

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full Text Sources