Automated labeling in medical data: A semi-supervised density-based approach for efficient diagnosis model development

Lincy Meera Mathews¹, Inaguri Muni Sai Haneesh¹, S R Mani Sekhar², A L Anvitha¹, Amith Shubhan¹, S Akash¹

Affiliations

¹ Department of Information Science and Engineering, M S Ramaiah Institute of Technology, Bangalore, India.
² Department of Information Science and Engineering, M S Ramaiah Institute of Technology, Bangalore, India. Electronic address: manisekharsr@gmail.com.

PMID: 40857819
DOI: 10.1016/j.compbiomed.2025.110963

Automated labeling in medical data: A semi-supervised density-based approach for efficient diagnosis model development

Lincy Meera Mathews et al. Comput Biol Med. 2025 Oct.

. 2025 Oct;197(Pt A):110963.

doi: 10.1016/j.compbiomed.2025.110963. Epub 2025 Aug 25.

Authors

Lincy Meera Mathews¹, Inaguri Muni Sai Haneesh¹, S R Mani Sekhar², A L Anvitha¹, Amith Shubhan¹, S Akash¹

Affiliations

¹ Department of Information Science and Engineering, M S Ramaiah Institute of Technology, Bangalore, India.
² Department of Information Science and Engineering, M S Ramaiah Institute of Technology, Bangalore, India. Electronic address: manisekharsr@gmail.com.

PMID: 40857819
DOI: 10.1016/j.compbiomed.2025.110963

Abstract

Background: In the rapidly expanding landscape of medical data acquisition, the demand for automated diagnosis and analysis models is paramount to support healthcare practitioners. Providing a learning model for automatic diagnosis and analysis is a necessity to support them. To formulate a diagnosis model, labeling the entire data manually is necessary. Machine learning and human intervention tasks are demanding, expensive, and error-prone.

Method: To simplify the above specified effort, the presented work aimed to improve the performance of semi-supervised learning by automating the labeling process and thus decreasing the development cost. The same is demonstrated using benchmarked medical datasets, which have only a small subset of the labeled data samples. Effective labeling is incorporated through the identification of peak density samples and the construction of the density clusters from the unlabeled data. The distribution of samples within the clusters are further analyzed to identify the high and low confidence regions. The samples within the regions are appended to the labeled dataset and are mapped to the class of the peak sample. This smaller subset of the data is selected for manual labeling which can then be leveraged to propagate labels to the rest of the data, thus minimizing the project budget.

Results and conclusion: The results suggest that the proposed SSDCCR- Semi - Supervised Density Based Clustering with a confidence region outperforms existing algorithms across multiple health datasets with a significant increase of at least 2 percent in accuracy. The algorithm approach is scalable to larger datasets and memory efficient with less complexity.

Keywords: Confidence regions; Density based clustering; Peak samples; Semi; Supervised learning.

PubMed Disclaimer

Conflict of interest statement

Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- ClinicalKey
- Elsevier Science

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Automated labeling in medical data: A semi-supervised density-based approach for efficient diagnosis model development

Affiliations

Automated labeling in medical data: A semi-supervised density-based approach for efficient diagnosis model development

Authors

Affiliations

Abstract

Conflict of interest statement

MeSH terms

LinkOut - more resources

Full Text Sources