Hybrid statistical and machine-learning approach to hearing-loss identification based on an oversampling technique
- PMID: 39672012
- DOI: 10.1016/j.compbiomed.2024.109539
Hybrid statistical and machine-learning approach to hearing-loss identification based on an oversampling technique
Abstract
Background and objectives: Hearing loss is a crucial global health hazard exerting considerable social and physiological effects on spoken language and cognition. Patients affected by this condition may experience social and professional hardships that dominate occupational injuries. Therefore, the identification of the features of recessive hearing loss is important for clinicians to prevent further disease progression. This work aimed to develop a hybrid statistical and machine-learning approach as a decision-support mechanism. We expect the proposed model to help predict hearing-loss disorders and support clinical diagnosis.
Methods: A three-phase hybrid approach was proposed to implement classification models. A stepwise method and a random forest (RF) technique were utilized as filters during feature selection. Phase I involved reducing the number of input variables and selecting the most influential features. Phase II included the use of an oversampling technique called synthetic minority oversampling technique (SMOTE) to oversample the minority class and balance the sample size between the target and nontarget classes. Phase III focused on the final model selection based on three supervised classification models, namely, the logistic regression, multilayer perceptron, and support vector machine (SVM), for the target identification and prediction of the case of interest (i.e., hearing loss).
Results: The analysis of phase I involved the selection and acquisition of three and seven features through the stepwise technique and RF method, respectively. The SMOTE technique alleviated the imbalanced data issue and improved the predictive capability substantially in phase II and III. Accordingly, in terms of accuracy, precision, recall, and F1 score, our empirical results demonstrated that the proposed hybrid approach involving the SVM method combined with a stepwise technique was competitive against the logistic model featuring all variables. Furthermore, the SVM models that cooperated with the stepwise and RF technique showed superiority to other approaches in terms of the area under the curve (AUC).
Conclusion: Compared with multivariate models, the hybrid approach combining the SVM method coupled with a stepwise technique and/or an RF technique is an excellent alternative with a higher efficiency. This approach requires fewer predictors in the model and can be competitive in terms of the accuracy, precision, recall, F1 score, and AUC. This work highlights the potential of hybrid statistical and machine-learning approaches. Our model can be used as a screening tool for upfront forecasting in clinical practice. The proposed hybrid approach also demonstrates a powerful capability to identify vital features and predict hearing loss.
Keywords: Feature selection; Hearing loss; Logistic regression; Multilayer perceptron; Support vector machine; Synthetic minority oversampling technique.
Copyright © 2024 Elsevier Ltd. All rights reserved.
Conflict of interest statement
Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Similar articles
-
Joint modeling strategy for using electronic medical records data to build machine learning models: an example of intracerebral hemorrhage.BMC Med Inform Decis Mak. 2022 Oct 25;22(1):278. doi: 10.1186/s12911-022-02018-x. BMC Med Inform Decis Mak. 2022. PMID: 36284327 Free PMC article.
-
Data Augmentation and Machine Learning algorithms for multi-class imbalanced morphometrics data of stingless bees.Heliyon. 2025 Jan 23;11(3):e42214. doi: 10.1016/j.heliyon.2025.e42214. eCollection 2025 Feb 15. Heliyon. 2025. PMID: 39931483 Free PMC article.
-
Combining handcrafted features with latent variables in machine learning for prediction of radiation-induced lung damage.Med Phys. 2019 May;46(5):2497-2511. doi: 10.1002/mp.13497. Epub 2019 Apr 8. Med Phys. 2019. PMID: 30891794 Free PMC article.
-
Interpretable machine learning model to predict surgical difficulty in laparoscopic resection for rectal cancer.Front Oncol. 2024 Feb 6;14:1337219. doi: 10.3389/fonc.2024.1337219. eCollection 2024. Front Oncol. 2024. PMID: 38380369 Free PMC article. Review.
-
Artificial intelligence in clinical care amidst COVID-19 pandemic: A systematic review.Comput Struct Biotechnol J. 2021;19:2833-2850. doi: 10.1016/j.csbj.2021.05.010. Epub 2021 May 7. Comput Struct Biotechnol J. 2021. PMID: 34025952 Free PMC article. Review.
MeSH terms
LinkOut - more resources
Full Text Sources