How to scale input features for SVM classification?

Question

I am trying to perform a two-class classification using SVM in MATLAB. The two classes are 'Normal' and 'Infected' for classifying cell images into Normal or Infected respectively.

I use a training set which consists of 1000 Normal cell images and 300 Infected cell images. I extract 72 features from each of these cells. So my training feature set matrix is 72x1300 where each row represents a features and each column represents the corresponding feature value measured from the corresponding image.

data: 72x1300 double

My class label vector is initialized as:

cellLabel(1:1000) = {'normal'};
cellLabel(1001:1300) = {'infected'};

As suusgested in these links: http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf and svm scaling input values, I set about scaling the feature values doing this:

for i=1:1:size(data,1)
mu(i) = mean(data(:,i));
sd(i) = std(data(:,i));
scaledData(:,i) = (data(:,i) - mu(i))./sd(i);
end

For testing, I read a test image and compute a 72x1 feature vector. Before I classify, I scale the test vector using the corresponding mean and standard deviation values from the `data' and then classify. If I do this, I am getting a 0% training accuracy. However, if I scale each from each class separately and concatenate, I am getting a 98% training accuracy. Can someone explain if my method is correct? For training accuracy, I knew what image I was using and hence read the mean and SD value. How should I do it for a case where the image's label is unknown?

This is how I train:

[idx,z] = rankfeatures(data,cellLabel,'Criterion','wilcoxon','NUMBER',7);
rnkData = data(idx,:);
rnkData = rnkData';
cellLabel = cellLabel';
SVMModel = fitcsvm(rnkData,cellLabel,'Standardize',true,'KernelFunction','RBF','KernelScale','auto');

You can see I tried using the in-built scaling property but the classification tends to show 'normal' class irrespective of the input.

In the latest SVM implementation in MATLAB it is the default of the build-in functions to scale the input — Adriaan

Valentin Heinitz Valentin Heinitz · Accepted Answer · 2015-09-14T12:55:09

corresponding mean and standard deviation values

What do you mean by that? Do you have mean and std. dev. for each feature? Why not using actual min/max than?

I'm not sure how feasible is this to implement in Matlab, but in my OpenCV/SVM code I store all min/max values from the training data for each feature and use these min/max values to scale the test data of a corresponding feature.

If the value of test-data is often outside the range of min/max from training data, this is a strong hint of insufficient amount of training data. Using mean and std. dev. values you won't detect this so explicitly.

How to scale input features for SVM classification?

1 Answers