Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Dec 30;26(1):39.
doi: 10.3390/e26010039.

Probabilistic Nearest Neighbors Classification

Affiliations

Probabilistic Nearest Neighbors Classification

Bruno Fava et al. Entropy (Basel). .

Abstract

Analysis of the currently established Bayesian nearest neighbors classification model points to a connection between the computation of its normalizing constant and issues of NP-completeness. An alternative predictive model constructed by aggregating the predictive distributions of simpler nonlocal models is proposed, and analytic expressions for the normalizing constants of these nonlocal models are derived, ensuring polynomial time computation without approximations. Experiments with synthetic and real datasets showcase the predictive performance of the proposed predictive model.

Keywords: NP-completeness; nearest neighbors classification; probabilistic machine learning.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflicts of interest.

Figures

Figure 1
Figure 1
Weighted undirected nonplanar graphs associated with a specific neighborhood structure, determined by the three nearest neighbors according to Euclidean distance, and the sizes of three different cuts, obtained by summing the weights of all edges linking two vertices of different colors. On each graph, light gray and black edges have weights with values one and two, respectively, according to the adjacency matrix B. Black and white vertices correspond to class labels being equal to one and zero, respectively.
Figure 2
Figure 2
Distance matrix D and directed graphs showing the neighborhood structure for the nonlocal models with r=1 and r=2.
Figure 3
Figure 3
Two impossible directed graphs for a nonlocal model. On the left graph, the white vertex has outdegree equal to zero. On the right graph, the white vertex has outdegree equal to two. Both graphs contradict the fact that each training sample unit has exactly one r-th nearest neighbor. The conclusion is that in a directed graph describing the neighborhood structure of a nonlocal model each subgraph contains exactly one simple cycle.
Figure 4
Figure 4
Example of the reduction process for the summation on each subgraph.
Figure 5
Figure 5
Training sample for the Ripley synthetic dataset. Circles and triangles indicate the two possible values of the response variable. The heatmap represents the probabilities of class “triangle” given by the predictive model pnnclass, ranging from pure cyan (probability zero) to pure orange (probability one).
Figure 6
Figure 6
ROC curves for the Ripley and the Pima Indian datasets. Annotated summaries correspond to the probability threshold maximizing the sum of the classifier sensitivity and specificity.
Figure 7
Figure 7
Testing sample scores on the first two principal components of the predictors in the Forensic Glass dataset. The different shapes indicate the four possible values of the response variable. The colors blue or red mean, respectively, that the corresponding testing sample units were classified correctly or incorrectly.

References

    1. Fix E., Hodges J.L. Discriminatory analysis—Nonparametric discrimination: Consistency properties. Int. Stat. Rev. 1989;57:238–247. doi: 10.2307/1403797. - DOI
    1. Devroye L., Györfi L., Lugosi G. A Probabilistic Theory of Pattern Recognition. Volume 31 Springer; Berlin, Germany: 2013.
    1. Biau G., Devroye L. Lectures on the Nearest Neighbor Method. Volume 246 Springer; Berlin, Germany: 2015.
    1. Holmes C., Adams N. A probabilistic nearest neighbour method for statistical pattern recognition. J. R. Stat. Soc. B Stat. Methodol. 2002;64:295–306. doi: 10.1111/1467-9868.00338. - DOI
    1. Cucala L., Marin J.M., Robert C.P., Titterington D.M. A Bayesian Reassessment of Nearest-Neighbor Classification. J. Am. Stat. Assoc. 2009;104:263–273. doi: 10.1198/jasa.2009.0125. - DOI

LinkOut - more resources