A Cheap Feature Selection Approach for the K-Means Algorithm

Marco Capo, Aritz Perez, Jose A Lozano

PMID: 32598285
DOI: 10.1109/TNNLS.2020.3002576

A Cheap Feature Selection Approach for the K-Means Algorithm

Marco Capo et al. IEEE Trans Neural Netw Learn Syst. 2021 May.

. 2021 May;32(5):2195-2208.

doi: 10.1109/TNNLS.2020.3002576. Epub 2021 May 3.

Authors

Marco Capo, Aritz Perez, Jose A Lozano

PMID: 32598285
DOI: 10.1109/TNNLS.2020.3002576

Abstract

The increase in the number of features that need to be analyzed in a wide variety of areas, such as genome sequencing, computer vision, or sensor networks, represents a challenge for the K -means algorithm. In this regard, different dimensionality reduction approaches for the K -means algorithm have been designed recently, leading to algorithms that have proved to generate competitive clusterings. Unfortunately, most of these techniques tend to have fairly high computational costs and/or might not be easy to parallelize. In this article, we propose a fully parallelizable feature selection technique intended for the K -means algorithm. The proposal is based on a novel feature relevance measure that is closely related to the K -means error of a given clustering. Given a disjoint partition of the features, the technique consists of obtaining a clustering for each subset of features and selecting the m features with the highest relevance measure. The computational cost of this approach is just O(m·max{n·K,logm}) per subset of features. We additionally provide a theoretical analysis on the quality of the obtained solution via our proposal and empirically analyze its performance with respect to well-known feature selection and feature extraction techniques. Such an analysis shows that our proposal consistently obtains the results with lower K -means error than all the considered feature selection techniques: Laplacian scores, maximum variance, multicluster feature selection, and random selection while also requiring similar or lower computational times than these approaches. Moreover, when compared with feature extraction techniques, such as random projections, the proposed approach also shows a noticeable improvement in both error and computational time.

PubMed Disclaimer

Publication types

Actions

LinkOut - more resources

Full Text Sources
- IEEE Engineering in Medicine and Biology Society
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A Cheap Feature Selection Approach for the K-Means Algorithm

A Cheap Feature Selection Approach for the K-Means Algorithm

Authors

Abstract

Publication types

LinkOut - more resources

Full Text Sources

Other Literature Sources