1
votes

Any kernel matrix (or Gram matrix) calculated using arbitrary data is guaranteed to be positive semi-definite.

I have a matrix of data D where each row is a data vector. If I calculate the kernel like so,

K = D*D';

it turns out to be not positive-semi-definite, let alone positive definite.

Why could this happen? Is there something I'm missing? My hunch tells me that it is most likely numerical errors as all the negative eigenvalues of my Kernel matrix are around 1E-12.

This question somewhat alludes to an answer, but I cannot imagine why the matrix would not be at least symmetric!

1
can you provide an example of D that is failing? - Buck Thorn
So my dataset is 456 x 6 dimensional, however if you just did D = rand(450, 6) you will likely be able to reproduce my issue, which makes it all the more baffling. - devmax

1 Answers

6
votes

general note

First, check carefully if your D is correctly oriented. If you are using

K = D*D';

then you need D with dimensions N x d, where N-number of samples, d-number of features (in other words - row oriented dataset). Otherwise you will also get a valid Gramian, but for different problem (and for d >> N it could lead to more numerical instabilities).

semi-positive definite Gram matrix

Resulitng matrix will always be symmetric (unless you have not deterministic artihmetic operations).

Semi positive definitness is also guaranteed, and only possible reason of lack of such feature are numerical inaccuracies. If such thing appears, consider reduction of D dimensionality through some soft technique, like for example PCA with very high k (equal to D/2 for example). One other trick, which might help (however, it introduces additional mathematical constraint on the problem) is computing:

K = D*D' + eps*I

where eps is a small constant (lets say 1e-10, so it is bigger than your negative eigenvalues) and I is an identity matrix of dimension N. This technique has many names (depending on the field of study), one of which is regularization.

positive definite Gram matrix

Positive definite gram (kernel) matrix is much less common, and it is p.d. iff your vectors are linearly independent (so in particular you need d>=N, but obviously linear independece is something stronger so this is only the requirement, not iff). For this reason many kernel matrices are obtained through some complex projecetions which ensure linear independence, for example RBF kernel induces linear independence (in the feature space!) so kernel matrix based on RBF should be p.d. (up to numerical errors, and assuming that your dataset is consistent, meaning each point is different from the rest).