0
votes

I'm trying to use libsvm (Matlab library) for a regression problem. I have a dataset of 192 instances. Here is my code to divide the data in train and test set:

idx = [zeros(170,1) ;ones(22,1)];
idx = idx(randperm(192));
train = data(idx==0,:);
train_label = label(idx==0,:);
test = data(idx==1,:);
test_label = label(idx==1,:);

model_1 = svmtrain(train_label,train,'-s 3 -t 2 -c 1 -g 0.01');
model_2 = svmtrain(label,data,'-s 3 -t 2 -c 1 -g 0.01');

[y_hat, Acc,Dec] = svmpredict(test_label, test, model);

If I use the whole dataset (model_1) to train the model than for each instance of the test set I have different predicted values, while if I use only the training set I obtain exactly the same value for each test record. I thought it was because the train set could be too small to train a good model so I tried using 190 instances for training and only 2 for test. But even with this division I get the same predicted value for the 2 test instances? Is there something wrong with the code?

1
Judging by your code, it seems that model_2 is the one where you use the full data set for training (not model_1 as per your post). - Marc Claesen
Exactly but why I obtain different values using only the whole dataset? If I use 190 instances for training and only 2 for test (which is more or less as using the whole dataset) I obtain same predicted values - Titus Pullo

1 Answers

0
votes

You should use scaling, Try to scale train and data vectors in your code