Ensemble learning models that predict surface protein abundance from single-cell multimodal omics data
- PMID: 33039573
- DOI: 10.1016/j.ymeth.2020.10.001
Ensemble learning models that predict surface protein abundance from single-cell multimodal omics data
Abstract
Single-cell protein abundance is a fundamental type of information to characterize cell states. Due to high cost and technical barriers, however, direct quantification of proteins is difficult. Single-cell RNA sequencing (scRNA-seq) data, serving as a cost-effective substitute of single-cell proteomics, may not accurately reflect protein expression levels due to measurement error, noise, post-transcriptional and translational regulation, etc. The recently emerging single-cell multimodal omics data, e.g. CITE-seq and REAP-seq, can simultaneously profile RNA and protein abundances in single cells, providing labeled data for predictive modeling in a supervised learning framework. Deep neural network-based transfer learning method has been applied to imputation of surface protein abundances from single-cell transcriptomic data. However, it is unclear if the artificial neural network is the best model, and it is desirable to improve the prediction performance (e.g. accuracy, interpretability) of machine learning models. In this paper, we compared several tree-based ensemble learning methods with neural network models, and found that ensemble learning often performed better than neural network, and Random Forest (RF) performed the best overall. Moreover, we used the feature importance scores from RF to interpret biological mechanisms underlying the prediction. Our study demonstrates the effectiveness of ensemble learning for reliable protein abundances prediction using single-cell multimodal omics data, and paves the way for knowledge discovery by mining single-cell multi-omics data in large scale.
Keywords: CITE-seq; Ensemble learning; Protein abundance; REAP-seq; Single cell; Transcriptomic.
Copyright © 2020 Elsevier Inc. All rights reserved.
Similar articles
-
Imputing abundance of over 2,500 surface proteins from single-cell transcriptomes with context-agnostic zero-shot deep ensembles.Cell Syst. 2024 Sep 18;15(9):869-884.e6. doi: 10.1016/j.cels.2024.08.006. Epub 2024 Sep 6. Cell Syst. 2024. PMID: 39243755
-
Benchmarking single-cell cross-omics imputation methods for surface protein expression.Genome Biol. 2025 Mar 4;26(1):46. doi: 10.1186/s13059-025-03514-9. Genome Biol. 2025. PMID: 40038818 Free PMC article.
-
Surface protein imputation from single cell transcriptomes by deep neural networks.Nat Commun. 2020 Jan 31;11(1):651. doi: 10.1038/s41467-020-14391-0. Nat Commun. 2020. PMID: 32005835 Free PMC article.
-
Machine learning and statistical methods for clustering single-cell RNA-sequencing data.Brief Bioinform. 2020 Jul 15;21(4):1209-1223. doi: 10.1093/bib/bbz063. Brief Bioinform. 2020. PMID: 31243426 Review.
-
Computational frameworks integrating deep learning and statistical models in mining multimodal omics data.J Biomed Inform. 2024 Apr;152:104629. doi: 10.1016/j.jbi.2024.104629. Epub 2024 Mar 28. J Biomed Inform. 2024. PMID: 38552994 Review.
Cited by
-
Omics Data and Data Representations for Deep Learning-Based Predictive Modeling.Int J Mol Sci. 2022 Oct 14;23(20):12272. doi: 10.3390/ijms232012272. Int J Mol Sci. 2022. PMID: 36293133 Free PMC article. Review.
-
Extrapolated cross-validation for randomized ensembles.J Comput Graph Stat. 2024;33(3):1061-1072. doi: 10.1080/10618600.2023.2288194. Epub 2024 Jan 3. J Comput Graph Stat. 2024. PMID: 39439808 Free PMC article.
-
Single-cell sequencing in diabetic retinopathy: progress and prospects.J Transl Med. 2025 Jan 13;23(1):49. doi: 10.1186/s12967-024-06066-x. J Transl Med. 2025. PMID: 39806376 Free PMC article. Review.
-
STREAK: A supervised cell surface receptor abundance estimation strategy for single cell RNA-sequencing data using feature selection and thresholded gene set scoring.PLoS Comput Biol. 2023 Aug 21;19(8):e1011413. doi: 10.1371/journal.pcbi.1011413. eCollection 2023 Aug. PLoS Comput Biol. 2023. PMID: 37603589 Free PMC article.
-
Rise of Deep Learning Clinical Applications and Challenges in Omics Data: A Systematic Review.Diagnostics (Basel). 2023 Feb 10;13(4):664. doi: 10.3390/diagnostics13040664. Diagnostics (Basel). 2023. PMID: 36832152 Free PMC article. Review.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources