4
votes

First I want to say I searched a lot before posting but didn't find anything about geting score greater then 1.0 (100%).

So first I use GridSearchCV to pick my model, but I didn't use the gamma parameter because the GridSearch was getting stuck. Here is my code:

from sklearn.svm import SVR
from sklearn.model_selection import GridSearchCV

svr = SVR()
param_grid = {'kernel':['linear','rbf','poly','sigmoid'], 'C':[1,5,10,15], 'degree':[3,6,9,12,15]}
gscv = GridSearchCV(svr,param_grid, cv = 6)
gscv.fit(features, ranks)
print(gscv.best_score_)
print(gscv.score(features, ranks)*-1) 

and it returns:

-1.02488175821

0.583772756529

Then I use a SVC with the best parameters returned from the GridSearch adding gamma and epsilon as paramters. I am cross validating the new model using KFold and printing scores according to different metrics as MSE,MAE,R^2 but they return very different results.

from sklearn.model_selection import KFold
from sklearn.metrics import mean_squared_error
from sklearn.metrics import median_absolute_error
from sklearn.metrics import r2_score
from sklearn.metrics import explained_variance_score
kf = KFold(n_splits=10)
svr = SVR(kernel = 'rbf', C = 10, epsilon = 0.001, gamma = 0.1)
scores = []
r2 = []
mae = []
mse = []
evs = []
for train_index, test_index in kf.split(ranks):
    x_train, x_test = features.iloc[train_index], features.iloc[test_index]
    y_train, y_test = ranks[train_index], ranks[test_index]
    svr.fit(x_train, y_train)
    predictions = svr.predict(x_test)
    mse.append(mean_squared_error(y_test,predictions))
    r2.append(r2_score(y_test,predictions))
    mae.append(median_absolute_error(y_test,predictions))
    evs.append(explained_variance_score(y_test,predictions))
    scores.append(svr.score(x_test,y_test))
    
print ('Classifier .score : {}'.format(np.asarray(scores).mean()*-1))  
print ('MSE score : {}'.format(np.asarray(mse).mean()))  
print ('R^2 score: {}'.format(np.asarray(r2).mean()*-1))
print ('MAE score: {}'.format(np.asarray(mae).mean()))
print ('EVS score: {}'.format(np.asarray(evs).mean()*-1))

This prints out

Classifier .score : 1.0535368037228126

MSE score : 0.004624197990041222

R^2 score: 1.0535368037228126

MAE score: 0.033673630575293226

EVS score: 1.0293436224922894

But if I change the gamma to 0.2 it prints this

Classifier .score : 0.5945396153777264

MSE score : 0.0035847763619656497

R^2 score: 0.5945396153777264

MAE score: 0.023670574621059648

EVS score: 0.5778668299600922

  1. So my question is what am I doing wrong?
  2. What I am supposed to do in this case?
  3. How is it possible to get a score higher than 1.0?
  4. Why some of the scores are 0.02 (I couldn't find what is the highest for this scoring method, but in sklearn documentation I saw that there are 0.8 in the example)?
1

1 Answers

1
votes

3 : Indeed R2 shouldn't be >1 but you have a score higher than 1 because you multiply by (-1). And there is absolutely no reason R2 shouldn't be negative. You can check the doc for r2_score : http://scikit-learn.org/stable/modules/generated/sklearn.metrics.r2_score.html It just means that your model didn't work so well...

4 : If I'm not wrong, MSE and MAE are indicators that you should use for regression. It's not natural to use them for classification. I think you should stick to R2 If you need more indicators you can try this :

from sklearn.metrics import classification_report

y_true, y_pred = y_test, clf.predict(X_test)
print(classification_report(y_true, y_pred))

Also, just a remark : svr.score will calculate R2 automatically so you don't need to use metrics.r2_score.

2 : Maybe you should try differents models like random forest, XGBoost, Extra trees, KNN...

Hope that helped you ! Good luck