Probabilistic Nearest Neighbors Classification

Bruno Fava¹, Paulo C Marques F², Hedibert F Lopes²

Affiliations

¹ Department of Economics, Northwestern University, Evanston, IL 60208, USA.
² Insper Institute of Education and Research, Rua Quatá 300, São Paulo 04546-042, Brazil.

PMID: 38248165
PMCID: PMC10814015
DOI: 10.3390/e26010039

Probabilistic Nearest Neighbors Classification

Bruno Fava et al. Entropy (Basel). 2023.

. 2023 Dec 30;26(1):39.

doi: 10.3390/e26010039.

Authors

Bruno Fava¹, Paulo C Marques F², Hedibert F Lopes²

Affiliations

¹ Department of Economics, Northwestern University, Evanston, IL 60208, USA.
² Insper Institute of Education and Research, Rua Quatá 300, São Paulo 04546-042, Brazil.

PMID: 38248165
PMCID: PMC10814015
DOI: 10.3390/e26010039

Abstract

Analysis of the currently established Bayesian nearest neighbors classification model points to a connection between the computation of its normalizing constant and issues of NP-completeness. An alternative predictive model constructed by aggregating the predictive distributions of simpler nonlocal models is proposed, and analytic expressions for the normalizing constants of these nonlocal models are derived, ensuring polynomial time computation without approximations. Experiments with synthetic and real datasets showcase the predictive performance of the proposed predictive model.

Keywords: NP-completeness; nearest neighbors classification; probabilistic machine learning.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflicts of interest.

Figures

**Figure 1**
Weighted undirected nonplanar graphs associated with a specific neighborhood structure, determined by the three nearest neighbors according to Euclidean distance, and the sizes of three different cuts, obtained by summing the weights of all edges linking two vertices of different colors. On each graph, light gray and black edges have weights with values one and two, respectively, according to the adjacency matrix B. Black and white vertices correspond to class labels being equal to one and zero, respectively.

**Figure 2**
Distance matrix D and directed graphs showing the neighborhood structure for the nonlocal models with $r = 1$ and $r = 2$ .

**Figure 3**
Two impossible directed graphs for a nonlocal model. On the left graph, the white vertex has outdegree equal to zero. On the right graph, the white vertex has outdegree equal to two. Both graphs contradict the fact that each training sample unit has exactly one r-th nearest neighbor. The conclusion is that in a directed graph describing the neighborhood structure of a nonlocal model each subgraph contains exactly one simple cycle.

**Figure 4**
Example of the reduction process for the summation on each subgraph.

**Figure 5**
Training sample for the Ripley synthetic dataset. Circles and triangles indicate the two possible values of the response variable. The heatmap represents the probabilities of class “triangle” given by the predictive model pnnclass, ranging from pure cyan (probability zero) to pure orange (probability one).

**Figure 6**
ROC curves for the Ripley and the Pima Indian datasets. Annotated summaries correspond to the probability threshold maximizing the sum of the classifier sensitivity and specificity.

**Figure 7**
Testing sample scores on the first two principal components of the predictors in the Forensic Glass dataset. The different shapes indicate the four possible values of the response variable. The colors blue or red mean, respectively, that the corresponding testing sample units were classified correctly or incorrectly.

See this image and copyright information in PMC

References

1. Fix E., Hodges J.L. Discriminatory analysis—Nonparametric discrimination: Consistency properties. Int. Stat. Rev. 1989;57:238–247. doi: 10.2307/1403797. - DOI
1. Devroye L., Györfi L., Lugosi G. A Probabilistic Theory of Pattern Recognition. Volume 31 Springer; Berlin, Germany: 2013.
1. Biau G., Devroye L. Lectures on the Nearest Neighbor Method. Volume 246 Springer; Berlin, Germany: 2015.
1. Holmes C., Adams N. A probabilistic nearest neighbour method for statistical pattern recognition. J. R. Stat. Soc. B Stat. Methodol. 2002;64:295–306. doi: 10.1111/1467-9868.00338. - DOI
1. Cucala L., Marin J.M., Robert C.P., Titterington D.M. A Bayesian Reassessment of Nearest-Neighbor Classification. J. Am. Stat. Assoc. 2009;104:263–273. doi: 10.1198/jasa.2009.0125. - DOI

LinkOut - more resources

Full Text Sources
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Probabilistic Nearest Neighbors Classification

Affiliations

Probabilistic Nearest Neighbors Classification

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources

Miscellaneous