Volume 10, Issue 1

Articles
Trott's Corner
New Products
New Publications
Calendar
News Bulletins
New Resources
Classifieds

Editorial Policy
Staff and Contributors
Submissions
Subscriptions
Back Issues
Contact Information

A Flexible Implementation for Support Vector Machines

# Feature Space and Kernels

In all of the preceding examples, the separating surface is assumed to be linear (a hyperplane). This is often a serious limitation, as many pattern recognition problems are inherently nonlinear in the input data and require nonlinear separating surfaces. To overcome this obstacle, for each specific problem we devise some appropriate transformation from input space (the domain of the original data) to a feature space . The function is chosen so that a hyperplane in corresponds to some desirable class of surfaces in . (Choosing this function for a specific problem is something of an art, but we can always try a few different known to have been successful in similar problems and simply hope for the best.)

As an example, consider a second-degree polynomial surface in , described by . We may represent such a surface by choosing a mapping as

and forming a hyperplane in defined by , where . If we now solve the problem in the new variables , we obtain a wide-margin hyperplane in corresponding to a second-degree surface in .

The problem with this approach is that nonlinear -transformations may have huge dimensions. Even for the simple quadratic surface considered here, we obtain when . For increasing polynomial degree, this number grows quickly, and there are even nonlinear functions that result in infinite-dimensional . This is why we do not solve the primal problem (2) directly: may not be a finite vector, but always is.

This problem is elegantly solved by kernels. It turns out that there are many functions for which we can compute scalar products in implicitly, without actually calculating . And in fact, this is all we need for solving the SVM formulation (1). Thus we define the kernel function as

In the case of example (6), it turns out that , which is why we chose that specific form of (explaining the two separate terms and ). It is not difficult to prove that this result holds for any polynomial degree ; therefore, using the polynomial kernel

we can obtain any polynomial separating surfaces.

## A Nonlinear Example: Using Kernels

Let us see how kernels are handled in MathSVM to solve nonlinear problems. The second-degree kernel in (6) is provided by

Here is some data a linear classifier cannot possibly cope with.

Let us solve this problem using the polynomial kernel. This is done as before, by supplying the desired kernel (which can be any function accepting two arguments) using the KernelFunction option.

When visualizing the results, SVMPlot can use the kernel functions to draw any nonlinear decision curves.

## High-Dimensional Input Spaces

An interesting consequence of the kernel idea is that the dimensionality of the input space does not matter for the time-complexity of the SVM algorithm. Since the solution is computed using only dot products between samples (in input space or some feature space), high-dimensional problems are solved equally fast (not considering the time used to precalculate the kernel matrix , which is usually not noticeable). As an example of this, consider a problem with dimension .

The kernel matrix is still just .

The SVM algorithm is still fast, although the problem is much harder due to extremely low sample density, which is reflected by more support vectors (the nonzero ).

The solution in this case may be viewed using the projection of data onto the weight vector . The separation looks almost perfect, although there will be problems with overfitting (the solution may not work well when applied to new, unseen examples ). However, this problem is outside the scope of the present article.