1
votes

The non-linear kernels allow the SVM to separate non-linear data linearly in a high dimensional space. The RBF kernel is probably the most popular non-linear kernel.

I was told that the RBF kernel is Gaussian, and therefore is infinitely differentive. With this property, the RBF kernel can map the data from a low dimensional space to an INFINITE dimensional space. I have 2 questions:

1) Could anyone explain why the number of feature space after mapping is corresponding to the derivatives of the kernel? I am not clear on this part. 2) There are many non-linear kernels, such as polynomial kernel, and I believe they are also able to map the data from a low dimensional space to an infinite dimensional space. But why the RBF kernel is more popular then them?

Thank you for your help in advance.

1

1 Answers

1
votes

1) Could anyone explain why the number of feature space after mapping is corresponding to the derivatives of the kernel? I am not clear on this part.

It has nothing to do with being differentiable, linear kernel is also infinitely differentiable and does not map to any higher dimensional space, whoever told you that it is the reason -- lied or did not understand the math behind it. The infinite dimension comes from the mapping

phi(x) = Nor(x, sigma^2)

in other words you are mapping your point into function being a Gaussian distribution, which is an element of L^2 space, infinitely dimension space of continuous function, where scalar product is defined as an integral of multiplication of functions, so

<f,g> = int f(a)g(a) da

and as such

<phi(x),phi(y)> = int Nor(x,sigma^2)(a)Nor(y,sigma^2)(a) da 
                = X exp(-(x-y)^2 / (4sigma^2) )

for some normalising constant X (which is completely unimportant). In other words, Gaussian kernel is a scalar product between two functions, which have infinite dimensions.

2) There are many non-linear kernels, such as polynomial kernel, and I believe they are also able to map the data from a low dimensional space to an infinite dimensional space. But why the RBF kernel is more popular then them?

Polynomial kernel maps into feature space with O(d^p) dimensions, where d is input space dimension and p is polynomial degree, so it is far from being infinite. Why is Gaussian popular? Because it works, and is quite easy to use and fast to compute. From theoretical point of view it also has guarantees of learning any arbitrary set of points (with small enough variances used).