A learning method for the class imbalance problem with medical data sets
- PMID: 20347072
- DOI: 10.1016/j.compbiomed.2010.03.005
A learning method for the class imbalance problem with medical data sets
Abstract
In medical data sets, data are predominately composed of "normal" samples with only a small percentage of "abnormal" ones, leading to the so-called class imbalance problems. In class imbalance problems, inputting all the data into the classifier to build up the learning model will usually lead a learning bias to the majority class. To deal with this, this paper uses a strategy which over-samples the minority class and under-samples the majority one to balance the data sets. For the majority class, this paper builds up the Gaussian type fuzzy membership function and alpha-cut to reduce the data size; for the minority class, we use the mega-trend diffusion membership function to generate virtual samples for the class. Furthermore, after balancing the data size of classes, this paper extends the data attribute dimension into a higher dimension space using classification related information to enhance the classification accuracy. Two medical data sets, Pima Indians' diabetes and the BUPA liver disorders, are employed to illustrate the approach presented in this paper. The results indicate that the proposed method has better classification performance than SVM, C4.5 decision tree and two other studies.
2010 Elsevier Ltd. All rights reserved.
Similar articles
-
Enhancing instance-based classification with local density: a new algorithm for classifying unbalanced biomedical data.Bioinformatics. 2006 Apr 15;22(8):981-8. doi: 10.1093/bioinformatics/btl027. Epub 2006 Jan 27. Bioinformatics. 2006. PMID: 16443633
-
Active learning methods for interactive image retrieval.IEEE Trans Image Process. 2008 Jul;17(7):1200-11. doi: 10.1109/TIP.2008.924286. IEEE Trans Image Process. 2008. PMID: 18586627
-
Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval.IEEE Trans Pattern Anal Mach Intell. 2006 Jul;28(7):1088-99. doi: 10.1109/TPAMI.2006.134. IEEE Trans Pattern Anal Mach Intell. 2006. PMID: 16792098
-
Classification and knowledge discovery in protein databases.J Biomed Inform. 2004 Aug;37(4):224-39. doi: 10.1016/j.jbi.2004.07.008. J Biomed Inform. 2004. PMID: 15465476
-
A comprehensive data level analysis for cancer diagnosis on imbalanced data.J Biomed Inform. 2019 Feb;90:103089. doi: 10.1016/j.jbi.2018.12.003. Epub 2019 Jan 3. J Biomed Inform. 2019. PMID: 30611011 Review.
Cited by
-
Machine learning applications for prediction of relapse in childhood acute lymphoblastic leukemia.Sci Rep. 2017 Aug 7;7(1):7402. doi: 10.1038/s41598-017-07408-0. Sci Rep. 2017. PMID: 28784991 Free PMC article.
-
Comparison of machine learning techniques to predict all-cause mortality using fitness data: the Henry ford exercIse testing (FIT) project.BMC Med Inform Decis Mak. 2017 Dec 19;17(1):174. doi: 10.1186/s12911-017-0566-6. BMC Med Inform Decis Mak. 2017. PMID: 29258510 Free PMC article.
-
Computational analysis of variability and uncertainty in the clinical reference on magnetic resonance imaging radiomics: modelling and performance.Vis Comput Ind Biomed Art. 2024 Nov 19;7(1):28. doi: 10.1186/s42492-024-00180-9. Vis Comput Ind Biomed Art. 2024. PMID: 39557758 Free PMC article.
-
Machine Learning Models for Classifying High- and Low-Grade Gliomas: A Systematic Review and Quality of Reporting Analysis.Front Oncol. 2022 Apr 22;12:856231. doi: 10.3389/fonc.2022.856231. eCollection 2022. Front Oncol. 2022. PMID: 35530302 Free PMC article.
-
Implications of resampling data to address the class imbalance problem (IRCIP): an evaluation of impact on performance between classification algorithms in medical data.JAMIA Open. 2023 May 31;6(2):ooad033. doi: 10.1093/jamiaopen/ooad033. eCollection 2023 Jul. JAMIA Open. 2023. PMID: 37266187 Free PMC article.
MeSH terms
LinkOut - more resources
Full Text Sources
Miscellaneous