Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Aug 13;23(8):1045.
doi: 10.3390/e23081045.

On Supervised Classification of Feature Vectors with Independent and Non-Identically Distributed Elements

Affiliations

On Supervised Classification of Feature Vectors with Independent and Non-Identically Distributed Elements

Farzad Shahrivari et al. Entropy (Basel). .

Abstract

In this paper, we investigate the problem of classifying feature vectors with mutually independent but non-identically distributed elements that take values from a finite alphabet set. First, we show the importance of this problem. Next, we propose a classifier and derive an analytical upper bound on its error probability. We show that the error probability moves to zero as the length of the feature vectors grows, even when there is only one training feature vector per label available. Thereby, we show that for this important problem at least one asymptotically optimal classifier exists. Finally, we provide numerical examples where we show that the performance of the proposed classifier outperforms conventional classification algorithms when the number of training data is small and the length of the feature vectors is sufficiently high.

Keywords: analytical error probability; independent and non-identically distributed features; supervised classification.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
A typical structural modelling of the classification learning problem.
Figure 2
Figure 2
An alternative modeling of the classification learning problem.
Figure 3
Figure 3
An alternative modelling of the classification learning problem when the elements of Yn(X) are mutually independent but non-identically distributed (i.non-i.d.).
Figure 4
Figure 4
Comparison in error probability between the naive Bayes classifier, KNN, and the proposed classifier.
Figure 5
Figure 5
Comparison in error probability between the naive Bayes classifier, KNN, and the proposed classifier.
Figure 6
Figure 6
Comparison in error probability of the proposed classifier for different values of r when |Y|=6. The related theoretical upper bounds for each value of r are also given.
Figure 7
Figure 7
Illustration of the probability distributions pi(yi|x1) (upper figure) and pi(yi|x2) (lower figure), for i=1,2,,n.
Figure 8
Figure 8
Comparison in error probability between the naive Bayes classifier, KNN, and the proposed classifier (T=1).
Figure 9
Figure 9
Comparison in error probability between the naive Bayes classifier and the proposed classifier (T=100).
Figure 10
Figure 10
Illustration of the probability distributions pi(yi|x1) (upper figure) and pi(yi|x2) (lower figure), for i=1,2,,n.
Figure 11
Figure 11
Comparison in error probability between the naive Bayes classifier and the proposed classifier (T=1).

References

    1. Shalev-Shwartz S., Ben-David S. Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press; Cambridge, MA, USA: 2014.
    1. Quinlan J.R. Learning Efficient Classification Procedures and Their Application to Chess End Games. Springer; Berlin/Heidelberg, Germany: 1983. pp. 453–482.
    1. Breiman L., Friedman J.H., Olshen R.A., Stone C.J. Classification and Regression Trees. Wadsworth and Brooks; Monterey, CA, USA: 1984.
    1. Boser B.E., Guyon I.M., Vapnik V.N. A training algorithm for optimal margin classifiers; Proceedings of the Fifth Annual Workshop on Computational Learning Theory; San Jose, CA, USA. 27–29 July 1992; pp. 144–152.
    1. Cortes C., Vapnik V. Support-vector networks. Mach. Learn. 1995;20:273–297. doi: 10.1007/BF00994018. - DOI