. 2006 Sep 6;7 Suppl 2(Suppl 2):S4.

doi: 10.1186/1471-2105-7-S2-S4.

Support vector machine implementations for classification & clustering

Stephen Winters-Hilt¹, Anil Yelundur, Charlie McChesney, Matthew Landry

Affiliations

PMID: 17118147
PMCID: PMC1683575
DOI: 10.1186/1471-2105-7-S2-S4

Support vector machine implementations for classification & clustering

Stephen Winters-Hilt et al. BMC Bioinformatics. 2006.

. 2006 Sep 6;7 Suppl 2(Suppl 2):S4.

doi: 10.1186/1471-2105-7-S2-S4.

Authors

Stephen Winters-Hilt¹, Anil Yelundur, Charlie McChesney, Matthew Landry

Affiliation

¹ Department of Computer Science, University of New Orleans, New Orleans, LA 70148, USA. swinters@chnola-research.org

PMID: 17118147
PMCID: PMC1683575
DOI: 10.1186/1471-2105-7-S2-S4

Abstract

Background: We describe Support Vector Machine (SVM) applications to classification and clustering of channel current data. SVMs are variational-calculus based methods that are constrained to have structural risk minimization (SRM), i.e., they provide noise tolerant solutions for pattern recognition. The SVM approach encapsulates a significant amount of model-fitting information in the choice of its kernel. In work thus far, novel, information-theoretic, kernels have been successfully employed for notably better performance over standard kernels. Currently there are two approaches for implementing multiclass SVMs. One is called external multi-class that arranges several binary classifiers as a decision tree such that they perform a single-class decision making function, with each leaf corresponding to a unique class. The second approach, namely internal-multiclass, involves solving a single optimization problem corresponding to the entire data set (with multiple hyperplanes).

Results: Each SVM approach encapsulates a significant amount of model-fitting information in its choice of kernel. In work thus far, novel, information-theoretic, kernels were successfully employed for notably better performance over standard kernels. Two SVM approaches to multiclass discrimination are described: (1) internal multiclass (with a single optimization), and (2) external multiclass (using an optimized decision tree). We describe benefits of the internal-SVM approach, along with further refinements to the internal-multiclass SVM algorithms that offer significant improvement in training time without sacrificing accuracy. In situations where the data isn't clearly separable, making for poor discrimination, signal clustering is used to provide robust and useful information--to this end, novel, SVM-based clustering methods are also described. As with the classification, there are Internal and External SVM Clustering algorithms, both of which are briefly described.

PubMed Disclaimer

Figures

**Figure 1**
A sketch of the hyperplane separability heuristic for SVM binary classification. An SVM is trained to find an optimal hyperplane that separates positive and negative instances, while also constrained by structural risk minimization (SRM) criteria, which here manifests as the hyperplane having a thickness, or "margin," that is made as large as possible in seeking a separating hyperplane. A benefit of using SRM is much less complication due to overfitting (a common problem with Neural Network discrimination approaches). Given its geometric expression, it is not surprising that a key construct in the SVM formulation (via the choice of kernel) is the notion of "nearness" between instances (or nearness to the hyperplane, where it gives a measure of confidence in the classification, i.e., instances further from the decision hyperplane are called with greater confidence). Most notions of nearness explored in this context have stayed with the geometric paradigm and are known as "distance kernels," one example being the familiar Gaussian kernel which is based on the Euclidean distance: K_Gaussian(x,y) = exp(-D_Eucl.(x,y)²/2σ²), where D_Eucl.(x,y) = [∑_k(x_k-y_k)2]^1/2is the usual Euclidean distance. Those kernels are used in the signal pattern recognition analysis in Figure 8 along with a new class of kernels, "divergence kernels," based on a notion of nearness appropriate when comparing probability distributions (or probability feature vectors). The main example of this is the Entropic Divergence Kernel: K_Entropic= exp(-D_Entropic.(x,y)²/2σ²), where D_Entropic.(x,y) = D(x||y) + D(y||x) and D(..||..) is the Kullback-Leibler Divergence (or relative entropy) between x and y.

**Figure 2**
a. (A) shows a nanopore device based on the α-hemolysin channel. It has been used for analysis of single DNA molecules, such as ssDNA, shown, and dsDNA, a nine base-pair DNA hairpin is shown in (B) superimposed on the channel geometry. The channel current blockade trace for the nine base-pair DNA hairpin blockade from (B) is shown in (C). b shows the signal processing architecture that was used to classify DNA hairpins with this approach: Signal acquisition was performed using a time-domain, thresholding, Finite State Automaton, followed by adaptive pre-filtering using a wavelet-domain Finite State Automaton. Hidden Markov Model processing with Expectation-Maximization was used for feature extraction on acquired channel blockades. Classification was then done by Support Vector Machine on five DNA molecules: four DNA hairpin molecules with nine base-pair stem lengths that only differed in their blunt-ended DNA termini, and an eight base-pair DNA hairpin. The accuracy shown is obtained upon completing the 15^thsingle molecule sampling/classification (in approx. 6 seconds), where SVM-based rejection on noisy signals was employed.

**Figure 3**
Comparative results are shown on performance of Kernels and algorithmic variants. The classification is between two DNA hairpins (in terms of features from the blockade signals they produce when occluding ion flow through a nanometer-scale channel). Implementations: WH SMO (W); Platt SMO (P); Keerthi1 (1); and Keerthi2 (2). Kernels: Absdiff (a); Entropic (e); and Gaussian (g). The best algorithm/kernel on this and other channel blockade data studied has consistently been the WH SMO variant and the Absdiff and Entropic Kernels. Another benefit of the WH SMO variant is its significant speedup over the other methods (about half the time of Platt SMO and one fourth the time of Keerthi 1 or 2).

**Figure 4**
The percent increase in iterations-to-convergence against the 'C' value. For very low values of 'C' the gain is doubled while for very large values of 'C' the gain is low (almost constant for C > 150). Thus we note the dependence of the gain on 'C' value.

**Figure 5**
The number of bounded support vectors (BSV) as a function of 'C' value. There are many BSVs for very low values of 'C' and very few BSVs for large values of 'C'. Thus we can say that the number of BSVs plays a vital role in the speed of convergence of the algorithm.

**Figure 6**
The Percentage Data Rejection vs SN+SP curves are shown for test data classification runs with a binary classifier with one molecule (the positive, given by label) versus the rest (the negative). Since the signal calling wasn't passed through a Decision Tree, it doesn't accurately reflect total throughput, and they don't benefit from the "shielding" shown in the Decision Tree in Fig. 1 prototype. The Relative Entropy Kernel is shown because it provided the best results (over Gaussian and Absdiff).

**Figure 7**
The Percentage Data Rejection vs SN+SP curves are shown for test data classification runs with a multiclass discriminator. The following criterion is used for dropping weak data: for any data point x_i; if max_m{f_m(x_i)} ≤ Confidence Parameter, then the data point x_iis dropped. For this data set using AbsDiff kernel (σ²= 0.2) performed best, and a confidence parameter of 0.8 achieve 100% accuracy.

**Figure 8**
Shown is the schematic for an "external" SVM clustering algorithm.

**Figure 9**
**(a)** The percentage correct classification (an indication of the clustering success) is shown with successive iteration of the clustering algorithm. Five separate test runs are shown, on different data from the same classes. Note that the plateau at around 0.9, this is approximately the performance of a supervised binary SVM on the same data (i.e., perfect separation isn't possible with this data without employing weak-data rejection). **(b)** The degradation in clustering performance for less optimal selection of kernel and tuning parameter (variance in case of Gaussian). **(c)** The degradation in clustering performance for non-optimal selection of kernel and tuning parameter (variance in case of Gaussian). **(d)** Summary of the degradation in clustering performance for less optimal selection of kernel and tuning parameter – with averages of the five test-runs are used as representative curves for that kernel/tuning selection in the above.

**Figure 10**
Efforts are underway to use simulated annealing in the number of KKT Violators tolerated on each iteration of the external clustering algorithm, to accelerate the convergence (clustering) process. Our current approach, results shown, approximately halves the cluster time needed.

**Figure 11**
Several channel current cheminformatics tools are available for use via web interfaces at . These tools include a variety of SVM interfaces for classification and clustering (binary and multiclass), and HMM tools for feature extraction and structure identification (with applications to both channel current cheminformatics and computational genomics).

See this image and copyright information in PMC

References

1. Vapnik VN. The Nature of Statistical Learning Theory. 2. Springer-Verlag, New York; 1998.
1. Burges CJC. A tutorial on support vector machines for pattern recognition. Data Min Knowl Discov. 1998;2:121–67.
1. Winters-Hilt S, Vercoutere W, DeGuzman VS, Deamer DW, Akeson M, Haussler D. Highly Accurate Classification of Watson-Crick Basepairs on Termini of Single DNA Molecules. Biophys J. 2003;84:967–976. - PMC - PubMed
1. Platt JC. Fast Training of Support Vector Machines using Sequential Minimal Optimization. In: Scholkopf B, Burges CJC, Smola AJ, editor. Advances in Kernel Methods – Support Vector Learning. Ch. 12. MIT Press, Cambridge, USA; 1998.
1. Osuna E, Freund R, Girosi. F. An improved training algorithm for support vector machines. In: Principe J, Gile L, Morgan N, and Wilson E, editor. Neural Networks for Signal Processing VII. IEEE, New York; 1997. pp. 276–85.

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Support vector machine implementations for classification & clustering

Affiliation

Support vector machine implementations for classification & clustering

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Research Materials