Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Mar;33(3):631-8.
doi: 10.1109/TPAMI.2010.173.

Kernel optimization in discriminant analysis

Affiliations

Kernel optimization in discriminant analysis

Di You et al. IEEE Trans Pattern Anal Mach Intell. 2011 Mar.

Abstract

Kernel mapping is one of the most used approaches to intrinsically derive nonlinear classifiers. The idea is to use a kernel function which maps the original nonlinearly separable problem to a space of intrinsically larger dimensionality where the classes are linearly separable. A major problem in the design of kernel methods is to find the kernel parameters that make the problem linear in the mapped representation. This paper derives the first criterion that specifically aims to find a kernel representation where the Bayes classifier becomes linear. We illustrate how this result can be successfully applied in several kernel discriminant analysis algorithms. Experimental results, using a large number of databases and classifiers, demonstrate the utility of the proposed approach. The paper also shows (theoretically and experimentally) that a kernel version of Subclass Discriminant Analysis yields the highest recognition rates.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Here we show an example of two non-linearly separable class distributions, each consisting of 3 subclasses. (a) Classification boundary of LDA. (b) SDA’s solution. Note how this solution is piecewise linear (i.e., linear when separating subclasses, but nonlinear when classifying classes). (c) KDA’s solution.
Fig. 2
Fig. 2
Three examples of the use of the homoscedastic criterion, Q1. The examples are for two Normal distributions with equal covariance matrix up to scale and rotation. (a) The value of Q1 decreases as the angle θ increases. The 2D rotation between the two distributions θ is in the x axis. The value of Q1 is in the y axis. (b) When θ = 0°, the two distributions are homoscedastic, and Q1 takes its maximum value of .5. Note how for distributions that are close to homoscedastic (i.e., θ ≈ 0°), the value of the criterion remains high. (c) When θ = 45°, the value has decreased about .4. (d) By θ = 90°, Q1 3.
Fig. 3
Fig. 3
Here we show a two class classification problem with multi-modal class distributions. When σ = 1 both KDA (a) and KSDA (b) generate solutions that have small training error. (c) However, when the model complexity is small, σ = 3, KDA fails. (d) KSDA’s solution resolves this problem with piecewise smooth, nonlinear classifiers.

Similar articles

Cited by

References

    1. Baudat G, Anouar F. Generalized discriminant analysis using a kernel approach. Neural Computation. 2000;12(10):2835–2404. - PubMed
    1. Blake CL, Merz CJ. UCI repository of machine learning databases. University of California; Irvine: 1998. http://www.ics.uci.edu/mlearn/MLRepository.html.
    1. Bregman L. The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Comp Mathematics and Mathematical Physics. 1967;7:200217.
    1. Chen B, Yuan L, Liu H, Bao Z. Kernel subclass discriminant analysis. Neurocomputing. 2007
    1. Demsar J. Statistical comparisons of classifiers over multiple data sets. J Machine Learning Research. 2006;7:1–30.

Publication types