Volume 10, Issue 1 Articles Trott's Corner New Products New Publications Calendar News Bulletins New Resources Classifieds Download This Issue Editorial Policy Staff and Contributors Submissions Subscriptions Advertising Back Issues Contact Information 
A Flexible Implementation for Support Vector Machines
Feature Space and KernelsIn all of the preceding examples, the separating surface is assumed to be linear (a hyperplane). This is often a serious limitation, as many pattern recognition problems are inherently nonlinear in the input data and require nonlinear separating surfaces. To overcome this obstacle, for each specific problem we devise some appropriate transformation from input space (the domain of the original data) to a feature space . The function is chosen so that a hyperplane in corresponds to some desirable class of surfaces in . (Choosing this function for a specific problem is something of an art, but we can always try a few different known to have been successful in similar problems and simply hope for the best.) As an example, consider a seconddegree polynomial surface in , described by . We may represent such a surface by choosing a mapping as
and forming a hyperplane in defined by , where . If we now solve the problem in the new variables , we obtain a widemargin hyperplane in corresponding to a seconddegree surface in . The problem with this approach is that nonlinear transformations may have huge dimensions. Even for the simple quadratic surface considered here, we obtain when . For increasing polynomial degree, this number grows quickly, and there are even nonlinear functions that result in infinitedimensional . This is why we do not solve the primal problem (2) directly: may not be a finite vector, but always is. This problem is elegantly solved by kernels. It turns out that there are many functions for which we can compute scalar products in implicitly, without actually calculating . And in fact, this is all we need for solving the SVM formulation (1). Thus we define the kernel function as
In the case of example (6), it turns out that , which is why we chose that specific form of (explaining the two separate terms and ). It is not difficult to prove that this result holds for any polynomial degree ; therefore, using the polynomial kernel
we can obtain any polynomial separating surfaces. A Nonlinear Example: Using KernelsLet us see how kernels are handled in MathSVM to solve nonlinear problems. The seconddegree kernel in (6) is provided by
Here is some data a linear classifier cannot possibly cope with.
Let us solve this problem using the polynomial kernel. This is done as before, by supplying the desired kernel (which can be any function accepting two arguments) using the KernelFunction option.
When visualizing the results, SVMPlot can use the kernel functions to draw any nonlinear decision curves.
HighDimensional Input SpacesAn interesting consequence of the kernel idea is that the dimensionality of the input space does not matter for the timecomplexity of the SVM algorithm. Since the solution is computed using only dot products between samples (in input space or some feature space), highdimensional problems are solved equally fast (not considering the time used to precalculate the kernel matrix , which is usually not noticeable). As an example of this, consider a problem with dimension .
The kernel matrix is still just .
The SVM algorithm is still fast, although the problem is much harder due to extremely low sample density, which is reflected by more support vectors (the nonzero ).
The solution in this case may be viewed using the projection of data onto the weight vector . The separation looks almost perfect, although there will be problems with overfitting (the solution may not work well when applied to new, unseen examples ). However, this problem is outside the scope of the present article.


About Mathematica  Download Mathematica Player © Wolfram Media, Inc. All rights reserved. 