Mathematica Journal
Volume 10, Issue 1


In This Issue
Trott's Corner
New Products
New Publications
News Bulletins
New Resources

Download This Issue 

About the Journal
Editorial Policy
Staff and Contributors
Back Issues
Contact Information

A Flexible Implementation for Support Vector Machines
Roland Nilsson
Johan Björkegren
Jesper Tegnér

Feature Space and Kernels

In all of the preceding examples, the separating surface is assumed to be linear (a hyperplane). This is often a serious limitation, as many pattern recognition problems are inherently nonlinear in the input data and require nonlinear separating surfaces. To overcome this obstacle, for each specific problem we devise some appropriate transformation from input space (the domain of the original data) to a feature space . The function Phi is chosen so that a hyperplane in corresponds to some desirable class of surfaces in . (Choosing this function for a specific problem is something of an art, but we can always try a few different Phi known to have been successful in similar problems and simply hope for the best.)

As an example, consider a second-degree polynomial surface in , described by . We may represent such a surface by choosing a mapping as

and forming a hyperplane in defined by , where . If we now solve the problem in the new variables , we obtain a wide-margin hyperplane in corresponding to a second-degree surface in .

The problem with this approach is that nonlinear Phi-transformations may have huge dimensions. Even for the simple quadratic surface considered here, we obtain when . For increasing polynomial degree, this number grows quickly, and there are even nonlinear functions that result in infinite-dimensional . This is why we do not solve the primal problem (2) directly: may not be a finite vector, but Alpha always is.

This problem is elegantly solved by kernels. It turns out that there are many functions Phi for which we can compute scalar products in implicitly, without actually calculating Phi. And in fact, this is all we need for solving the SVM formulation (1). Thus we define the kernel function as

In the case of example (6), it turns out that , which is why we chose that specific form of Phi (explaining the two separate terms and ). It is not difficult to prove that this result holds for any polynomial degree ; therefore, using the polynomial kernel

we can obtain any polynomial separating surfaces.

A Nonlinear Example: Using Kernels

Let us see how kernels are handled in MathSVM to solve nonlinear problems. The second-degree kernel in (6) is provided by

Here is some data a linear classifier cannot possibly cope with.

Let us solve this problem using the polynomial kernel. This is done as before, by supplying the desired kernel (which can be any function accepting two arguments) using the KernelFunction option.

When visualizing the results, SVMPlot can use the kernel functions to draw any nonlinear decision curves.

High-Dimensional Input Spaces

An interesting consequence of the kernel idea is that the dimensionality of the input space does not matter for the time-complexity of the SVM algorithm. Since the solution is computed using only dot products between samples (in input space or some feature space), high-dimensional problems are solved equally fast (not considering the time used to precalculate the kernel matrix , which is usually not noticeable). As an example of this, consider a problem with dimension .

The kernel matrix is still just .

The SVM algorithm is still fast, although the problem is much harder due to extremely low sample density, which is reflected by more support vectors (the nonzero ).

The solution in this case may be viewed using the projection of data onto the weight vector . The separation looks almost perfect, although there will be problems with overfitting (the solution may not work well when applied to new, unseen examples ). However, this problem is outside the scope of the present article.

About Mathematica | Download Mathematica Player
© Wolfram Media, Inc. All rights reserved.