2
votes

Im using LIBSVM and MatLab to classify 34x5 data in 3 classes. I applied 10 fold Kfold cross validation method and RBF kernel. The output is this confusion matrix with 0.88 Correct rate (88 % accuracy). This is my confusion matrix

9   0   0
0   3   0
0   4   18

I would like to know what methods inside SVM to consider to improve the accuracy or other classifications method in Machine learning techniques. Any help?

Here is my SVM classification code

load Turn180SVM1; //load data file
libsvm_options = '-s 1 -t 2 -d 3 -r 0 -c 1 -n 0.1 -p 0.1 -m 100 -e 0.000001 -h 1 -b 0 -wi 1 -q';//svm options

C=size(Turn180SVM1,2);

% cross validation
for i = 1:10
    indices = crossvalind('Kfold',Turn180SVM1(:,C),10);
    cp = classperf(Turn180SVM1(:,C)); 
    for j = 1:10
        [X, Z] = find(indices(:,end)==j);%testing
        [Y, Z] = find(indices(:,end)~=j);%training


feature_training = Turn180SVM1([Y'],[1:C-1]); feature_testing = Turn180SVM1([X'],[1:C-1]);
class_training = Turn180SVM1([Y'],end); class_testing = Turn180SVM1([X'], end);
% SVM Training
       disp('training');
       [feature_training,ps] = mapminmax(feature_training',0,1);
       feature_training = feature_training';
       feature_testing = mapminmax('apply',feature_testing',ps)';
       model = svmtrain(class_training,feature_training,libsvm_options);  
% 

% SVM Prediction       
        disp('testing');
        TestPredict = svmpredict(class_testing,sparse(feature_testing),model);
       TestErrap = sum(TestPredict~=class_testing)./length(class_testing)*100;
         cp = classperf(cp, TestPredict, X);
        disp(((i-1)*10 )+j);
end;
end;
[ConMat,order] = confusionmat(TestPredict,class_testing);
cp.CorrectRate;
cp.CountingMatrix;
1
from the above confusion matrix, it seems that you dont have a lot of instances. If you can, try to get more data which might help (as long you understand the bias-variance tradeoff by plotting learning curves)Amro
Yes, true. Have 34 instances. How to plot the learning curve?user1629213
But whats the command to plot it?user1629213

1 Answers

3
votes

Many methods exist. If your tuning procedure is optimal (e.g. well executed cross-validation) your choices include:

  1. Improve preprocessing, perhaps tailor new aggregated features based on domain knowledge. Most importantly (and most effectively): make sure your inputs are standardized properly, for example by scaling every dimension onto [-1,1].

  2. Use another kernel: RBF kernels are known to perform very well in a wide variety of settings, but specialised kernels exist for many tasks. Don't consider this unless you know what you are doing. Since you are dealing with a low-dimensional problem, RBF is probably a good choice if your data is not structured.

  3. Reweigh training instances: particularly important when your data set is unbalanced (e.g. some classes have a lot less instances than others). You can do this with the -wX options in libsvm. All sorts of reweighting schemes exist, including variants of boosting. I'm not a major fan of this, since such approaches are prone to overfitting.

  4. Change the cross-validation cost function to suit your exact needs. Is accuracy really what you are looking for or do you want, say, high F1 or high ROC-AUC? It is surprising how many people optimize a performance measure they are not really interested in.