Automated labeling in medical data: A semi-supervised density-based approach for efficient diagnosis model development
- PMID: 40857819
- DOI: 10.1016/j.compbiomed.2025.110963
Automated labeling in medical data: A semi-supervised density-based approach for efficient diagnosis model development
Abstract
Background: In the rapidly expanding landscape of medical data acquisition, the demand for automated diagnosis and analysis models is paramount to support healthcare practitioners. Providing a learning model for automatic diagnosis and analysis is a necessity to support them. To formulate a diagnosis model, labeling the entire data manually is necessary. Machine learning and human intervention tasks are demanding, expensive, and error-prone.
Method: To simplify the above specified effort, the presented work aimed to improve the performance of semi-supervised learning by automating the labeling process and thus decreasing the development cost. The same is demonstrated using benchmarked medical datasets, which have only a small subset of the labeled data samples. Effective labeling is incorporated through the identification of peak density samples and the construction of the density clusters from the unlabeled data. The distribution of samples within the clusters are further analyzed to identify the high and low confidence regions. The samples within the regions are appended to the labeled dataset and are mapped to the class of the peak sample. This smaller subset of the data is selected for manual labeling which can then be leveraged to propagate labels to the rest of the data, thus minimizing the project budget.
Results and conclusion: The results suggest that the proposed SSDCCR- Semi - Supervised Density Based Clustering with a confidence region outperforms existing algorithms across multiple health datasets with a significant increase of at least 2 percent in accuracy. The algorithm approach is scalable to larger datasets and memory efficient with less complexity.
Keywords: Confidence regions; Density based clustering; Peak samples; Semi; Supervised learning.
Copyright © 2025 Elsevier Ltd. All rights reserved.
Conflict of interest statement
Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
MeSH terms
LinkOut - more resources
Full Text Sources