K Important Neighbors: A Novel Approach to Binary Classification in High Dimensional Data
- PMID: 29376076
- PMCID: PMC5742505
- DOI: 10.1155/2017/7560807
K Important Neighbors: A Novel Approach to Binary Classification in High Dimensional Data
Abstract
K nearest neighbors (KNN) are known as one of the simplest nonparametric classifiers but in high dimensional setting accuracy of KNN are affected by nuisance features. In this study, we proposed the K important neighbors (KIN) as a novel approach for binary classification in high dimensional problems. To avoid the curse of dimensionality, we implemented smoothly clipped absolute deviation (SCAD) logistic regression at the initial stage and considered the importance of each feature in construction of dissimilarity measure with imposing features contribution as a function of SCAD coefficients on Euclidean distance. The nature of this hybrid dissimilarity measure, which combines information of both features and distances, enjoys all good properties of SCAD penalized regression and KNN simultaneously. In comparison to KNN, simulation studies showed that KIN has a good performance in terms of both accuracy and dimension reduction. The proposed approach was found to be capable of eliminating nearly all of the noninformative features because of utilizing oracle property of SCAD penalized regression in the construction of dissimilarity measure. In very sparse settings, KIN also outperforms support vector machine (SVM) and random forest (RF) as the best classifiers.
Figures
Similar articles
-
N3 and BNN: Two New Similarity Based Classification Methods in Comparison with Other Classifiers.J Chem Inf Model. 2015 Nov 23;55(11):2365-74. doi: 10.1021/acs.jcim.5b00326. Epub 2015 Nov 2. J Chem Inf Model. 2015. PMID: 26479827
-
Classification of THz pulse signals using two-dimensional cross-correlation feature extraction and non-linear classifiers.Comput Methods Programs Biomed. 2016 Apr;127:64-82. doi: 10.1016/j.cmpb.2016.01.017. Epub 2016 Feb 1. Comput Methods Programs Biomed. 2016. PMID: 27000290
-
Construction accident narrative classification: An evaluation of text mining techniques.Accid Anal Prev. 2017 Nov;108:122-130. doi: 10.1016/j.aap.2017.08.026. Epub 2017 Sep 1. Accid Anal Prev. 2017. PMID: 28865927
-
Prototype-based models in machine learning.Wiley Interdiscip Rev Cogn Sci. 2016 Mar-Apr;7(2):92-111. doi: 10.1002/wcs.1378. Epub 2016 Jan 21. Wiley Interdiscip Rev Cogn Sci. 2016. PMID: 26800334 Review.
-
Probability estimation with machine learning methods for dichotomous and multicategory outcome: theory.Biom J. 2014 Jul;56(4):534-63. doi: 10.1002/bimj.201300068. Epub 2014 Jan 29. Biom J. 2014. PMID: 24478134 Review.
Cited by
-
Predicting CoVID-19 community mortality risk using machine learning and development of an online prognostic tool.PeerJ. 2020 Sep 28;8:e10083. doi: 10.7717/peerj.10083. eCollection 2020. PeerJ. 2020. PMID: 33062451 Free PMC article.
References
-
- Fernández-Delgado M., Cernadas E., Barro S., Amorim D. Do we need hundreds of classifiers to solve real world classification problems? Journal of Machine Learning Research. 2014;15(1):3133–3181.
-
- Lantz B. Machine learning with R. Packt Publishing Ltd; 2015.
-
- Pal A. K., Mondal P. K., Ghosh A. K. High dimensional nearest neighbor classification based on mean absolute differences of inter-point distances. Pattern Recognition Letters. 2016;74:1–8. doi: 10.1016/j.patrec.2016.01.018. - DOI
-
- Aggarwal C. C., Hinneburg A., Keim D. A. Database Theory — ICDT 2001. Vol. 1973. Berlin, Germany: Springer; 2001. On the surprising behavior of distance metrics in high dimensional space; pp. 420–434. (Lecture Notes in Computer Science). - DOI
-
- Lu C.-Y., Min H., Gui J., Zhu L., Lei Y.-K. Face recognition via weighted sparse representation. Journal of Visual Communication and Image Representation. 2013;24(2):111–116. doi: 10.1016/j.jvcir.2012.05.003. - DOI
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources