2
votes

I have the following code to plot my data using TSNE feature reduction algorithm in matlab

data=dlmread('features.txt');

meas=data(:,2:end);
species=data(:,1);

rng('default'); % for reproducibility
Y = tsne(meas,'Algorithm','exact','Distance','mahalanobis');
gscatter(Y(:,1),Y(:,2),species);
title('Mahalanobis');

However, by running it I have the following problem:

The covariance matrix for the Mahalanobis metric must be symmetric and positive definite.

Error in tsne (line 323) tempDistMat = pdist(X,distance);

Error in plotafeatures (line 7) Y = tsne(meas,'Algorithm','exact','Distance','mahalanobis');

With other distances the plot occurs correctly, what is possibly happening with my code or data?

My data can be found HERE

1

1 Answers

2
votes

The problem is indeed specifically with the Mahalanobis distance.

According to the tsnedocumentation, paragraph about distances:

'mahalanobis' — Mahalanobis distance, computed using the positive definite covariance matrix nancov(X).

It seems that your matrix meas doesn't fulfil this requirement. You can confirm if with the chol function. As the documentation says:

[R,p] = chol(A) for positive definite A (...) p is zero. If A is not positive definite, then p is a positive integer.

I tried with your data:

data=dlmread('features.txt');
meas=data(:,2:end);
[~, p] = chol(nancov(meas))

It returned p = 389, so nancov(meas) is not positive definite.

It works with other distances because they don't have that kind of requirement.