A novel logistic regression model combining semi-supervised learning and active learning for disease classification
- PMID: 30158596
- PMCID: PMC6115447
- DOI: 10.1038/s41598-018-31395-5
A novel logistic regression model combining semi-supervised learning and active learning for disease classification
Abstract
Traditional supervised learning classifier needs a lot of labeled samples to achieve good performance, however in many biological datasets there is only a small size of labeled samples and the remaining samples are unlabeled. Labeling these unlabeled samples manually is difficult or expensive. Technologies such as active learning and semi-supervised learning have been proposed to utilize the unlabeled samples for improving the model performance. However in active learning the model suffers from being short-sighted or biased and some manual workload is still needed. The semi-supervised learning methods are easy to be affected by the noisy samples. In this paper we propose a novel logistic regression model based on complementarity of active learning and semi-supervised learning, for utilizing the unlabeled samples with least cost to improve the disease classification accuracy. In addition to that, an update pseudo-labeled samples mechanism is designed to reduce the false pseudo-labeled samples. The experiment results show that this new model can achieve better performances compared the widely used semi-supervised learning and active learning methods in disease classification and gene selection.
Conflict of interest statement
The authors declare no competing interests.
Figures
References
-
- King G, Zeng L. Logistic regression in rare events data. Political analysis. 2001;9:137–163. doi: 10.1093/oxfordjournals.pan.a004868. - DOI
-
- Gunn SR. Support vector machines for classification and regression. ISIS technical report. 1998;14:85–86.
-
- Zhu X. Semi-supervised learning literature survey. Computer Science. 2–4 (2006).
-
- Fu, Y., Zhu, X. & Li, B. A survey on instance selection for active learning. Knowledge and information systems. 1–35 (2013).
-
- Lewis, D. D. & Catlett, J. Heterogeneous uncertainty sampling for supervised learning. Proceedings of the eleventh international conference on machine learning. 148–156 (1994).
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
