A novel logistic regression model combining semi-supervised learning and active learning for disease classification
- PMID: 30158596
- PMCID: PMC6115447
- DOI: 10.1038/s41598-018-31395-5
A novel logistic regression model combining semi-supervised learning and active learning for disease classification
Abstract
Traditional supervised learning classifier needs a lot of labeled samples to achieve good performance, however in many biological datasets there is only a small size of labeled samples and the remaining samples are unlabeled. Labeling these unlabeled samples manually is difficult or expensive. Technologies such as active learning and semi-supervised learning have been proposed to utilize the unlabeled samples for improving the model performance. However in active learning the model suffers from being short-sighted or biased and some manual workload is still needed. The semi-supervised learning methods are easy to be affected by the noisy samples. In this paper we propose a novel logistic regression model based on complementarity of active learning and semi-supervised learning, for utilizing the unlabeled samples with least cost to improve the disease classification accuracy. In addition to that, an update pseudo-labeled samples mechanism is designed to reduce the false pseudo-labeled samples. The experiment results show that this new model can achieve better performances compared the widely used semi-supervised learning and active learning methods in disease classification and gene selection.
Conflict of interest statement
The authors declare no competing interests.
Figures






Similar articles
-
Multi-class motor imagery EEG classification using collaborative representation-based semi-supervised extreme learning machine.Med Biol Eng Comput. 2020 Sep;58(9):2119-2130. doi: 10.1007/s11517-020-02227-4. Epub 2020 Jul 16. Med Biol Eng Comput. 2020. PMID: 32676841
-
CPSS: Fusing consistency regularization and pseudo-labeling techniques for semi-supervised deep cardiovascular disease detection using all unlabeled electrocardiograms.Comput Methods Programs Biomed. 2024 Sep;254:108315. doi: 10.1016/j.cmpb.2024.108315. Epub 2024 Jul 4. Comput Methods Programs Biomed. 2024. PMID: 38991373
-
ℓ1-norm based safe semi-supervised learning.Math Biosci Eng. 2021 Sep 7;18(6):7727-7742. doi: 10.3934/mbe.2021383. Math Biosci Eng. 2021. PMID: 34814272
-
Model performance and interpretability of semi-supervised generative adversarial networks to predict oncogenic variants with unlabeled data.BMC Bioinformatics. 2023 Feb 9;24(1):43. doi: 10.1186/s12859-023-05141-2. BMC Bioinformatics. 2023. PMID: 36759776 Free PMC article. Review.
-
Unsupervised and semi-supervised learning: the next frontier in machine learning for plant systems biology.Plant J. 2022 Sep;111(6):1527-1538. doi: 10.1111/tpj.15905. Epub 2022 Jul 27. Plant J. 2022. PMID: 35821601 Review.
Cited by
-
A clinician's guide to understanding and critically appraising machine learning studies: a checklist for Ruling Out Bias Using Standard Tools in Machine Learning (ROBUST-ML).Eur Heart J Digit Health. 2022 Apr 12;3(2):125-140. doi: 10.1093/ehjdh/ztac016. eCollection 2022 Jun. Eur Heart J Digit Health. 2022. PMID: 36713011 Free PMC article.
-
Artificial intelligence for dementia prevention.Alzheimers Dement. 2023 Dec;19(12):5952-5969. doi: 10.1002/alz.13463. Epub 2023 Oct 14. Alzheimers Dement. 2023. PMID: 37837420 Free PMC article. Review.
-
Active semi-supervised learning for biological data classification.PLoS One. 2020 Aug 19;15(8):e0237428. doi: 10.1371/journal.pone.0237428. eCollection 2020. PLoS One. 2020. PMID: 32813738 Free PMC article.
References
-
- King G, Zeng L. Logistic regression in rare events data. Political analysis. 2001;9:137–163. doi: 10.1093/oxfordjournals.pan.a004868. - DOI
-
- Gunn SR. Support vector machines for classification and regression. ISIS technical report. 1998;14:85–86.
-
- Zhu X. Semi-supervised learning literature survey. Computer Science. 2–4 (2006).
-
- Fu, Y., Zhu, X. & Li, B. A survey on instance selection for active learning. Knowledge and information systems. 1–35 (2013).
-
- Lewis, D. D. & Catlett, J. Heterogeneous uncertainty sampling for supervised learning. Proceedings of the eleventh international conference on machine learning. 148–156 (1994).
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources