0
votes

I want to perform a Cross Validation to select the best parameters Gamma and C for the RBF Kernel of the SVR (Support Vector Regression). I'm using LIBSVM. I have a database that contains 4 groups of 3D meshes. My question is: is this approach I am using is ok for 4-fold Cross Validation? I think, for selecting the parameters C and Gamma of the RBF Kernal, I must minimize the error between the predicted values and the groud_truth_values.

I have also another problem, I get this a NAN value while the Cross-Validation (Squared correlation coefficient = nan (regression))

Here is the code i wrote:

[C,gamma] = meshgrid(-5:2:15, -15:2:3); %range of values for C and 
                                        %gamma

%# grid search, and cross-validation

for m=1:numel(C)

    for k=1:4 
        fid1 = fopen(sprintf('list_learning_%d.txt',k), 'rt'); 
        i=1;

        while feof(fid1) == 0 
            tline = fgetl(fid1); 
            v= load(tline);
            v=normalize(v);
            matrix_feature_tmp(i,:)=v;
            i=i+1;
        end  

        fclose(fid1);

        % I fill matrix_feature_train of size m by n via matrix_feature_tmp

        %%construction of the test matrix
       fid2 = fopen(sprintf('liste_features_test%d.txt',k), 'rt'); 
       i=1;

       while feof(fid2) == 0 
           tline = fgetl(fid2); 
           v= load(tline);
           v=normalize(v);
           matrice_feature_test_tmp(i,:)=v;
           i=i+1;
       end  

       fclose(fid2);

       %I fill matrix_feature_test of size m by k via matrix_feature_test_tmp

       mos_learning=load(sprintf('mos_learning_%d.txt',k));
       mos_wanted=load(sprintf('mos_test%d.txt',k));

       model = svmtrain(mos_learning, matrix_feature_train',sprintf('- 
       s %f -t %f -c %f -g %f -p %f  ',3,2 ,2^C(m),2^gamma(m),1 ));

       [y_hat, Acc, projection] = svmpredict(mos_wanted,
                                  matrix_feature_test', model);
       MSE_Test = mean((y_hat-mos_wanted).^2);
       vecc_error(k)=MSE_Test;

       end
       mean_vec_error_fold(m)=mean(vecc_error);
end

%select the best gamma and C 
[~,idx]=min(mean_vec_error_fold);

best_C = 2^C(idx);
best_gamma = 2^gamma(idx);

%training with best parameters
%for example
 model = svmtrain(mos_learning1, matrice_feature_train1',sprintf('-s   
          %f -t %f -c %f -g %f -p %f  ',3,2 ,best_C, best_gamma,1 ));

[y_hat_final, Acc, projection] = svmpredict(mos_test1,matrice_feature_test1', 
                           model);
1

1 Answers

1
votes

Based on your description, without reading your code, it sounds like you are NOT doing cross-validation. Cross-validation requires you to pick a parameter set (i.e. a value for C and gamma) and holding those parameters constant use k-1 folds to train, 1 fold to test and to do this k times such that you use each fold as the test set once. Then aggregate the error / accuracy measure for these k tests and that is the measure you use to rank those parameters for a model trained on ALL the data. Call this your cross-validation error for the parameter set you used. You then repeat this process for a range of different parameters and choose the parameter set with the best accuracy / lowest CV error. Your final model is trained on all your data.

Your code doesn't really make sense to me. Looking at this snippet

folds = 4; 
for i=1:numel(C)
    cv_acc(i) = svmtrain(ground_truth, matrice_feature_train', ...
                sprintf(' -s %d -t %d -c %f -g %f -p %d -v %d',3,2, 
                2^C(i), 2^gamma(i), 1, 4)); %Kernel RBF
end

What is it that cv_acc contains? To me it contains the actual SVM model (an SVMStruct if you use the MATLAB toolbox, something else if you used LIBSVM). This would be OK IF you were using your loop to change which folds are used as the training set. However you have used them to change the value of your gamma and C parameters, which is incorrect. However you later call min(cv_acc); so I'm now guessing that you think the call to smvtrain actually returned the training error? I don't see how you can meaningfully call min on an array of structures like that, but I could be wrong. But even so, you aren't actually interested in minimising your training error, you want to minimise your cross-validation error which is the aggregate of the test error from your k runs and has nothing to do with your training error.

Now it's impossible to actually know if you've done this bt wrong since you don't show us the vectors of gamma and C but it's strange to only have 1 loop rather than a nested loop to iterate through these (unless you have arranged them like a truth-table but I doubt that). You need to test each potential value of C paired with each value of gamma. Currently it looks like you're only trying 1 different value of gamma for each value in C.

Have a look at this answer to see an example of cross-validation used with SVM.