Intrinsic entropy model for feature selection of scRNA-seq data
- PMID: 35102420
- PMCID: PMC9175189
- DOI: 10.1093/jmcb/mjac008
Intrinsic entropy model for feature selection of scRNA-seq data
Abstract
Recent advances of single-cell RNA sequencing (scRNA-seq) technologies have led to extensive study of cellular heterogeneity and cell-to-cell variation. However, the high frequency of dropout events and noise in scRNA-seq data confounds the accuracy of the downstream analysis, i.e. clustering analysis, whose accuracy depends heavily on the selected feature genes. Here, by deriving an entropy decomposition formula, we propose a feature selection method, i.e. an intrinsic entropy (IE) model, to identify the informative genes for accurately clustering analysis. Specifically, by eliminating the 'noisy' fluctuation or extrinsic entropy (EE), we extract the IE of each gene from the total entropy (TE), i.e. TE = IE + EE. We show that the IE of each gene actually reflects the regulatory fluctuation of this gene in a cellular process, and thus high-IE genes provide rich information on cell type or state analysis. To validate the performance of the high-IE genes, we conduct computational analysis on both simulated datasets and real single-cell datasets by comparing with other representative methods. The results show that our IE model is not only broadly applicable and robust for different clustering and classification methods, but also sensitive for novel cell types. Our results also demonstrate that the intrinsic entropy/fluctuation of a gene serves as information rather than noise in contrast to its total entropy/fluctuation.
Keywords: entropy decomposition; extrinsic entropy; feature selection; informative genes; intrinsic entropy; scRNA-seq.
© The Author(s) (2022). Published by Oxford University Press on behalf of Journal of Molecular Cell Biology, CEMCS, CAS.
Figures




Similar articles
-
sc-REnF: An entropy guided robust feature selection for single-cell RNA-seq data.Brief Bioinform. 2022 Mar 10;23(2):bbab517. doi: 10.1093/bib/bbab517. Brief Bioinform. 2022. PMID: 35037023
-
Dimension Reduction and Clustering Models for Single-Cell RNA Sequencing Data: A Comparative Study.Int J Mol Sci. 2020 Mar 22;21(6):2181. doi: 10.3390/ijms21062181. Int J Mol Sci. 2020. PMID: 32235704 Free PMC article.
-
Mcadet: A feature selection method for fine-resolution single-cell RNA-seq data based on multiple correspondence analysis and community detection.PLoS Comput Biol. 2024 Oct 28;20(10):e1012560. doi: 10.1371/journal.pcbi.1012560. eCollection 2024 Oct. PLoS Comput Biol. 2024. PMID: 39466833 Free PMC article.
-
Machine learning and statistical methods for clustering single-cell RNA-sequencing data.Brief Bioinform. 2020 Jul 15;21(4):1209-1223. doi: 10.1093/bib/bbz063. Brief Bioinform. 2020. PMID: 31243426 Review.
-
Evaluating the performance of dropout imputation and clustering methods for single-cell RNA sequencing data.Comput Biol Med. 2022 Jul;146:105697. doi: 10.1016/j.compbiomed.2022.105697. Epub 2022 Jun 8. Comput Biol Med. 2022. PMID: 35697529 Review.
Cited by
-
Single-cell omics: experimental workflow, data analyses and applications.Sci China Life Sci. 2025 Jan;68(1):5-102. doi: 10.1007/s11427-023-2561-0. Epub 2024 Jul 23. Sci China Life Sci. 2025. PMID: 39060615 Review.
-
scRDEN: single-cell dynamic gene rank differential expression network and robust trajectory inference.Sci Rep. 2025 May 15;15(1):16963. doi: 10.1038/s41598-025-01969-1. Sci Rep. 2025. PMID: 40374885 Free PMC article.
-
Neuroactive network tissue based on dual-factor neuroregenerative bioactive coating scaffolds and neural stem cells for spinal cord injury repair.Mater Today Bio. 2025 Aug 5;34:102172. doi: 10.1016/j.mtbio.2025.102172. eCollection 2025 Oct. Mater Today Bio. 2025. PMID: 40822932 Free PMC article.
References
-
- Breiman L. (2001). Random forests. Mach. Learn. 45, 5–32.
-
- Chen T.Q., Guestrin C. (2016). ‘XGBoost: a scalable tree boosting system’. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, 2016. 785–794. New York, NY, USA: Association for Computing Machinery.
-
- Chen W., Qin Y., Liu S. (2020). CCL20 signaling in the tumor microenvironment. Adv. Exp. Med. Biol. 1231, 53–65. - PubMed