Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jul 1;16(1):5508.
doi: 10.1038/s41467-025-60521-x.

An improved reference library and method for accurate cell-type deconvolution of bulk-tissue miRNA data

Affiliations

An improved reference library and method for accurate cell-type deconvolution of bulk-tissue miRNA data

Shaoying Zhu et al. Nat Commun. .

Abstract

MicroRNAs (miRNAs) play key roles in development and disease, and have great biomarker potential. However, because miRNA expression is highly cell-type specific, identifying miRNA biomarkers from complex tissues is hampered by the underlying cell-type heterogeneity. Due to that current single-cell RNA-Seq protocols are lagging behind for quantification of miRNA expression, and most miRNA profiling samples do not have matched mRNA expression or DNA methylation data for cell-type deconvolution, it is an urgent need to develop computational methods for cell-type proportion estimation of bulk-tissue miRNA data. Here we present a novel miRNA expression reference library and deconvolution tool for cell-type composition estimation of complex tissues. We show that our tool is accurate and robust for deconvolution in whole blood as well as in different solid tissues. By applying this tool to a range of different biological contexts, we demonstrate its value for screening of age-associated miRNAs, for monitoring the immune landscape in infectious diseases like COVID-19, as well as for identifying cell-type-specific miRNA biomarkers for early diagnosis and prognosis of human cancers. Our work establishes a computational framework for accurate cell-type mixture deconvolution of miRNA data.

PubMed Disclaimer

Conflict of interest statement

Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Reference library construction for accurate cell type composition estimation of real and simulated datasets.
A t-SNE scatterplot based on cell type-specific miRNAs identified. The plot indicates distinct clustering of six cell types and most of data variance can be attributed to cell type difference. B Hierarchical clustering heatmap for isolated leukocyte subtype based the mean expression of miRNA signature. C Gene Ontology (GO) analysis of target genes associated with cell-specific miRNAs. miRNA target genes were screened and used to identify enriched biological process terms at the FDR-adjusted P-value of 0.05, Enrichment FDR values are size encoded as indicated. D Stacked bar charts showing estimated cell fractions indicated that CD4+ T cell constitute the major cell type for purified samples. E Boxplots of the estimated cell fractions indicated the very low component for other cell types in the pure cell samples CD4 + T cell (n = 39 biological replicates), The boxes are bounded by the 25 and 75 percentiles and the center represents the median. The whiskers extend from each edge of the box to indicate the 1.5× interquartile range (IQR). F Barplots of the average cell proportions of each major cell type according to FACS and DeconmiR in 257 samples from Juzenas et al. Data are presented as mean values ± SE. G Scatterplots comparing the estimated cell fractions to the known fractions of the in-silico reconstructed mixture sample, where the mixing proportions are known. One scatterplot is shown for each blood cell subtypes used in reconstructions. In each case, the R2 and RMSE values been rounded to two significant digits are indicated. P-values are derived from the Pearson correlation test. Source data are provided as a Source Data file.
Fig. 2
Fig. 2. Accuracy evaluation of DeconmiR and comparison with other methods.
A Deconvolution results of simulated rare components with varying degree, the changes of RMSE and R2 values agree with different rare component proportions (n = 100 replicates at each proportion level). The boxes are bounded by the 25 and 75 percentiles and the center represents the median. The whiskers extend from each edge of the box to indicate the 1.5× interquartile range (IQR). B Averaged RMSE and R2 value for deconvolution results of simulated rare components with varying degree. C DeconmiR predicts relative cell fractions of 14 samples from adult human whole blood with flow cytometry fractions, the corresponding RMSE and R2 values for the comparison are presented. P-value is derived from the Pearson correlation test. D Comparing DeconmiR with other deconvolution methods, barplots of RMSE and R2 values for the estimated cellular proportions for the four different algorithms applied to the reconstructed mixture samples. E Average RMSE and R2 values for different algorithms across different cell types. Source data are provided as a Source Data file.
Fig. 3
Fig. 3. Deconvolution of AML and healthy samples using DeconmiR.
A Stacked bar charts of the cellular fractions for the 112 AML samples from TCGA. B Scatter plots for the cellular fractions predicted by DeconmiR and CIBERSORT (left panel), as well as by DeconmiR and EpiDISH (middle panel) for AML samples. Pearson correlation coefficients and p-values were calculated between the cellular fractions estimated by different methods. The corresponding RMSE values for the comparison are also presented (right panel). C Boxplots for cell fraction distribution in AML (n = 112) and healthy (n = 77) samples across the six blood cell types. ****p < 0.0001 for two-sided Wilcox rank sum test (p = 7.61e−16, 1.89e−17, and 5.19e−4 for CD8+ T cell, NK cell and Neutrophil, respectively). The boxes are bounded by the 25 and 75 percentiles and the center represents the median. The whiskers extend from each edge of the box to indicate the 1.5× interquartile range (IQR). D Clustering heatmap of the cell type fractions in AML samples. Two sample groups can be observed with different cell type fraction distribution. E Kaplan–Meier survival plot indicates different prognosis for these two groups of patients. The survival difference between different groups is calculated by log-rank test. Source data are provided as a Source Data file.
Fig. 4
Fig. 4. DeconmiR improves detection sensitivity of aging related miRNAs in blood and cell type composition aberrance with SARS-Cov-2 infection.
A Boxplot of cellular proportions of 38 age-related blood samples as inferred from DeconmiR. The boxes are bounded by the 25 and 75 percentiles and the center represents the median. The whiskers extend from each edge of the box to indicate the 1.5× interquartile range (IQR). B Heatmap for associations between the top principal components and potential confounders of miRNA profiling data. P-values are derived from Pearson correlation test for numerical variables, and from Kruskal-Wallis test for categorical variables. C QQ-plot for all miRNAs passing quality control from supervised analysis against age only adjusted for gender (“No adjustment”). P-value is derived from simple regression and was q-value adjusted. D QQ-plot for all miRNAs passing quality control from supervised analysis against age and adjusted for gender and cell proportions as estimated by DeconmiR (“Cell type adjusted”). P-value is derived from multiple regression and was q-value adjusted. E Venn diagram indicated the number of specific and shared age-related miRNAs with and without cell type proportion adjusted. F Boxplot for cell type composition across six cell types in COVID-19 patients (n = 31) and healthy control (n = 16) samples. See (A) for boxplot definition. P-values are derived from two-sided Wilcoxon rank sum test. G Boxplots for cell type composition for CD4+ T cell and neutrophil in COVID-19 patients with different clinical features (n = 15, 19 and 13 for mild, moderate and serious, respectively). See (A) for boxplot definition. P-value are derived from two-sided Wilcoxon rank sum test. Source data are provided as a Source Data file.
Fig. 5
Fig. 5. Reference construction and cell type deconvolution of human breast tissues.
A Hierarchical clustering heatmap for the 10 cell typereference matrix constructed for breast tissue. B Stacked bar charts of the cell fractions inferred by DeconmiR for 4 samples of breast epithelium cell line from microRNAome project. C Boxplot of cellular proportions inferred by DeconmiR for 36 whole blood samples from microRNAome project. The boxes are bounded by the 25 and 75 percentiles and the center represents the median. The whiskers extend from each edge of the box to indicate the 1.5× interquartile range (IQR). D Boxplot for cell type composition across six cell major categories for breast cancer (n = 1085) and healthy control (n = 104) samples from TCGA. See (C) for boxplot definition. P-value are derived from two-sided Wilcoxon rank sum test. E Kaplan–Meier survival plot for two groups of patients classified by the macrophage level. The survival difference between groups is calculated by log-rank test. Source data are provided as a Source Data file.
Fig. 6
Fig. 6. Applications of DeconmiR in solid tumor to identify cancer biomarkers.
A Strategy used to construct a gold standard list of breast cancer differentially expressed miRNAs. Briefly, two separate lists of DEMs are derived by comparing breast cancer cell lines to normal breast epithelial cell from FANTOM5 and microRNAome projects, respectively. By taking the overlap of these two, a high-confidence set of breast cancer DEMs occurring in the epithelial compartment of the breast was obtained. B The sensitivity and specificity of three different DEM identification methods: unadjusted analysis, adjusted using DeconmiR-estimated cell-type fractions as independent covariates, and using SVA. C Identification of epithelial-specific smoking differentially expressed miRNAs and their relevance in lung cancer. Smoothed scatterplot of CellDMC t-statistics in epithelial cells (x-axis) versus those in immune cells (y-axis) derived from a cohort of 48 pairs of lung adenocarcinoma and non-malignant lung tissue. Pearson correlation coefficients and p-values were calculated. Green dashed lines indicate level of statistical significance (FDR < 0.05). Orange and green points mark the up- and downregulated smoking miRNAs derived from bronchial airway with smoking as the phenotype. Those passed statistical significance threshold were labeled. D Boxplots indicates the top-rank smoking related miR-183-3p and miR-139-5p in bronchial airway are also differentially expressed in normal and cancer samples (n = 48) The boxes are bounded by the 25 and 75 percentiles and the center represents the median. The whiskers extend from each edge of the box to indicate the 1.5× interquartile range (IQR). P-values are derived from two-sided Wilcoxon rank sum test. E Scatterplots of the average expression of smoking up- and downregulated miRNAs against cell fractions of epithelial and immune cell fractions. Pearson correlation coefficients and p-values were calculated. F Scatterplots of expression levels of the top-rank smoking-associated epithelial-specific miRNA miR-183-3p against the epithelial cell fraction (left panel) and total immune cell fraction (right panel) across the 559 LUAD samples from the TCGA. Pearson correlation coefficients and p-values were calculated. G Scatterplot of Spearman correlation coefficients (y-axis) between expression and epithelial fraction, as computed over the 559 TCGA LUAD samples, against the CellDMC t-statistics for predicted epithelial-specific miRNAs derived from GSE110907. Dashed lines indicate level of statistical significance (FDR < 0.05). P-value is from Fisher’s exact test on the miRNAs passing significance in each quadrant. H Average expression levels of cell-specific miRNAs derived with CellDMC from the GSE110907, in the normal-adjacent (n = 46) and LUAD (n = 513) samples from the TCGA. See (D) for boxplot definition. P-values are derived from two-sided Wilcoxon rank sum test. Source data are provided as a Source Data file.

Similar articles

References

    1. Bartel, D. P. MicroRNAs: genomics, biogenesis, mechanism, and function. Cell116, 281–297 (2004). - PubMed
    1. Hwang, H. W. & Mendell, J. T. MicroRNAs in cell proliferation, cell death, and tumorigenesis. Br. J. Cancer94, 776–780 (2006). - PMC - PubMed
    1. Landgraf, P. et al. A mammalian microRNA expression atlas based on small RNA library sequencing. Cell129, 1401–1414 (2007). - PMC - PubMed
    1. Ludwig, N. et al. Distribution of miRNA expression across human tissues. Nucleic Acids Res.44, 3865–3877 (2016). - PMC - PubMed
    1. Schwarzer, A. et al. The non-coding RNA landscape of human hematopoiesis and leukemia. Nat. Commun.8, 218 (2017). - PMC - PubMed

LinkOut - more resources