Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Feb;11(7):e2306329.
doi: 10.1002/advs.202306329. Epub 2023 Dec 10.

Highly Accurate Estimation of Cell Type Abundance in Bulk Tissues Based on Single-Cell Reference and Domain Adaptive Matching

Affiliations

Highly Accurate Estimation of Cell Type Abundance in Bulk Tissues Based on Single-Cell Reference and Domain Adaptive Matching

Xinyang Guo et al. Adv Sci (Weinh). 2024 Feb.

Abstract

Accurately identifies the cellular composition of complex tissues, which is critical for understanding disease pathogenesis, early diagnosis, and prevention. However, current methods for deconvoluting bulk RNA sequencing (RNA-seq) typically rely on matched single-cell RNA sequencing (scRNA-seq) as a reference, which can be limiting due to differences in sequencing distribution and the potential for invalid information from single-cell references. Hence, a novel computational method named SCROAM is introduced to address these challenges. SCROAM transforms scRNA-seq and bulk RNA-seq into a shared feature space, effectively eliminating distributional differences in the latent space. Subsequently, cell-type-specific expression matrices are generated from the scRNA-seq data, facilitating the precise identification of cell types within bulk tissues. The performance of SCROAM is assessed through benchmarking against simulated and real datasets, demonstrating its accuracy and robustness. To further validate SCROAM's performance, single-cell and bulk RNA-seq experiments are conducted on mouse spinal cord tissue, with SCROAM applied to identify cell types in bulk tissue. Results indicate that SCROAM is a highly effective tool for identifying similar cell types. An integrated analysis of liver cancer and primary glioblastoma is then performed. Overall, this research offers a novel perspective for delivering precise insights into disease pathogenesis and potential therapeutic strategies.

Keywords: deconvolution; tissue heterogeneity; transcriptomics; transfer learning.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Overview of SCROAM. a) The deconvolution model that uses a reference requires two input datasets: bulk RNA‐seq count and a reference containing counts of scRNA‐seq reads. Additionally, the single‐cell transcriptome data must label the cell type to be quantified. b) SCROAM learns gene‐specific transformations of bulk data by utilizing the reference sequences observed in single‐cell data. This allows us to account for potential technical bias between sequencing technologies used in single‐cell and bulk RNA‐seq data. c) SCROAM begins with scRNA‐seq data and classifies the cells into different cell types, which were represented by different colors in the analysis. By calculating gene specificity in a given cell type, an expression matrix reflecting cell type specificity was constructed. d) SCROAM employs single‐cell reference data to estimate the cell type ratio in transformed bulk data.
Figure 2
Figure 2
The figure displays the error distribution for each method in the pseudobulk experiment utilizing data from the Tabula Muris Senis dataset. The experiment was conducted on eight distinct organs, and the errors were computed as the mean L1 error across various cell types in each organ. a,c) show the results for the Smart‐seq2 reference and 10x Chromium pseudobulk. b,d) show the results for the 10x Chromium pseudobulk and Smart‐seq2 pseudobulk. In the violin plots, the distribution of errors for each evaluated method is presented, with white dots indicating the mean error. The grid plots use colors to indicate the difference between the mean errors of the different methods in that organ, with darker reds indicating relatively poorer performance. These visualizations allow for easy comparison of the performance of different methods across different organs and experimental conditions.
Figure 3
Figure 3
depicts the results of the Large Intestine organ dataset using Smart‐seq2 as a reference. a) The comparison of results before and after data transformation is shown, indicating that the data transformed by KMM resulted in lower error rates. b) The distance between the raw bulk data and the single‐cell reference is compared with the distance between the transformed data and the single‐cell reference. The value in each box represents the JSD distance between the sample and the cell. The results show that the distance between the transformed data and single‐cell reference is significantly smaller than that of the raw bulk data, highlighting the effectiveness of the KMM data transformation step. c) The deconvolution analysis results for each sample are presented, demonstrating that the results using transformed data are generally higher than those without transformation.
Figure 4
Figure 4
The evaluation of each applicable method using data from Dong,[ 21 ] which includes known cell type proportions. a) shows the single‐cell clustering results and t‐SNE visualization of the three cell types in the dataset, MDA‐MB‐468, MCF‐7, and normal fibroblasts, with a ratio of ≈6:3:1. b) The benchmark of deconvolution results for bulk RNA‐seq samples generated by different methods is presented. The proportion estimated by SCROAM has the lowest Mean L1 errors (2.7) to the ground truth, indicating superior accuracy in estimating cell type proportions.
Figure 5
Figure 5
Each applicable method was evaluated using data from the neural stem region of the mouse spinal cord. a) Following single‐cell clustering, t‐SNE visualization was generated, revealing six clusters: d_qNSCs, p_qNSCs_early, p_qNSCs_late, aNSCs, TAPs, NB. b) In the benchmarking of deconvolution results on bulk samples generated by different methods, SCROAM was observed to provide the most accurate estimation of the actual biological proportions among all the benchmarked methods.
Figure 6
Figure 6
Effect of cell ratio on patient survival. a) Effect of LSEC cell fraction on overall survival (OS), with patients exhibiting high levels of LSEC cells having longer survival times. b) The effect of cholangiocyte fraction on OS, with patients having a high proportion of cholangiocytes associated with a lower OS.
Figure 7
Figure 7
Relationship between cell status and prognosis of non‐malignant cells in various tumor types from the TCGA cohort. a) Violin plot visualizing the distribution of cell type fractions in each tumor type. The median is represented by a white dot and the upper and lower quartiles are represented by bars. b,c) The association between oligodendrocyte(b) and pericyte(c) infiltration with survival in GBM using Kaplan–Meier plots.

Similar articles

Cited by

References

    1. a) Carithers L. J., Moore H. M., The genotype‐tissue expression (GTEx) project, Vol. 13, Mary Ann Liebert, New Rochelle, NY, USA: 2015. - PMC - PubMed
    2. b) Tomczak K., Czerwinska P., Wiznerowicz M., Contemp. Oncol. (Pozn) 2015, 19, A68. - PMC - PubMed
    1. Saliba A.‐E., Westermann A. J., Gorski S. A., Vogel J., Nucleic Acids Res. 2014, 42, 8845. - PMC - PubMed
    1. a) Denisenko E., Guo B. B., Jones M., Hou R., De Kock L., Lassmann T., Poppe D., Clément O., Simmons R. K., Lister R., Forrest A. R. R., Genome biol. 2020, 21, 130; - PMC - PubMed
    2. b) Kuksin M., Morel D., Aglave M., Danlos F.‐X., Marabelle A., Zinovyev A., Gautheret D., Verlingue L., Eur. J. Cancer 2021, 149, 193. - PubMed
    1. a) Vallania F., Tam A., Lofgren S., Schaffert S., Azad T. D., Bongen E., Haynes W., Alsup M., Alonso M., Davis M., Engleman E., Khatri P., Nat. Commun. 2018, 9, 4735; - PMC - PubMed
    2. b) Avila Cobos F., Vandesompele J., Mestdagh P., De Preter K., Bioinformatics 2018, 34, 1969; - PubMed
    3. c) Sturm G., Finotello F., Petitprez F., Zhang J. D., Baumbach J., Fridman W. H., List M., Aneichyk T., Bioinformatics 2019, 35, i436. - PMC - PubMed
    1. Jin H., Liu Z., Genome biol. 2021, 22, 102. - PMC - PubMed

Publication types

LinkOut - more resources