Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017:2017:7560807.
doi: 10.1155/2017/7560807. Epub 2017 Dec 11.

K Important Neighbors: A Novel Approach to Binary Classification in High Dimensional Data

Affiliations

K Important Neighbors: A Novel Approach to Binary Classification in High Dimensional Data

Hadi Raeisi Shahraki et al. Biomed Res Int. 2017.

Abstract

K nearest neighbors (KNN) are known as one of the simplest nonparametric classifiers but in high dimensional setting accuracy of KNN are affected by nuisance features. In this study, we proposed the K important neighbors (KIN) as a novel approach for binary classification in high dimensional problems. To avoid the curse of dimensionality, we implemented smoothly clipped absolute deviation (SCAD) logistic regression at the initial stage and considered the importance of each feature in construction of dissimilarity measure with imposing features contribution as a function of SCAD coefficients on Euclidean distance. The nature of this hybrid dissimilarity measure, which combines information of both features and distances, enjoys all good properties of SCAD penalized regression and KNN simultaneously. In comparison to KNN, simulation studies showed that KIN has a good performance in terms of both accuracy and dimension reduction. The proposed approach was found to be capable of eliminating nearly all of the noninformative features because of utilizing oracle property of SCAD penalized regression in the construction of dissimilarity measure. In very sparse settings, KIN also outperforms support vector machine (SVM) and random forest (RF) as the best classifiers.

PubMed Disclaimer

Figures

Figure 1
Figure 1
K important neighbors (KIN) algorithm for classification.
Figure 2
Figure 2
Misclassification rate of proposed KIN versus SVM, RF, and KNN for 100, 300, and 500 (up to down) features (N indicates sample size).

Similar articles

Cited by

References

    1. Fernández-Delgado M., Cernadas E., Barro S., Amorim D. Do we need hundreds of classifiers to solve real world classification problems? Journal of Machine Learning Research. 2014;15(1):3133–3181.
    1. Lantz B. Machine learning with R. Packt Publishing Ltd; 2015.
    1. Pal A. K., Mondal P. K., Ghosh A. K. High dimensional nearest neighbor classification based on mean absolute differences of inter-point distances. Pattern Recognition Letters. 2016;74:1–8. doi: 10.1016/j.patrec.2016.01.018. - DOI
    1. Aggarwal C. C., Hinneburg A., Keim D. A. Database Theory — ICDT 2001. Vol. 1973. Berlin, Germany: Springer; 2001. On the surprising behavior of distance metrics in high dimensional space; pp. 420–434. (Lecture Notes in Computer Science). - DOI
    1. Lu C.-Y., Min H., Gui J., Zhu L., Lei Y.-K. Face recognition via weighted sparse representation. Journal of Visual Communication and Image Representation. 2013;24(2):111–116. doi: 10.1016/j.jvcir.2012.05.003. - DOI

LinkOut - more resources