0
votes

I do multi-class classification on unbalanced classes. I'm using SGDClassifier(), GradientBoostingClassifier(), RandomForestClassifier(), and LogisticRegression()with class_weight='balanced'. To compare the results. it is required to compute the accuracy. I tried the following way to compute weighted accuracy:

n_samples = len(y_train)
weights_cof = float(n_samples)/(n_classes*np.bincount(data[target_label].as_matrix().astype(int))[1:])
sample_weights = np.ones((n_samples,n_classes)) * weights_cof
print accuracy_score(y_test, y_pred, sample_weight=sample_weights)

y_train is a binary array. So sample_weights has the same shape as y_train (n_samples, n_classes). When I run the script, I received the following error:

Update:

 Traceback (most recent call last):
  File "C:\Program Files (x86)\JetBrains\PyCharm Community Edition 2016.3.2\helpers\pydev\pydevd.py", line 1596, in <module>
    globals = debugger.run(setup['file'], None, None, is_module)
  File "C:\Program Files (x86)\JetBrains\PyCharm Community Edition 2016.3.2\helpers\pydev\pydevd.py", line 974, in run
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "D:/Destiny/DestinyScripts/MainLocationAware.py", line 424, in <module>
    predict_country(featuresDF, score, featuresLabel, country_sample_size, 'gbc')
  File "D:/Destiny/DestinyScripts/MainLocationAware.py", line 313, in predict_country
    print accuracy_score(y_test, y_pred, sample_weight=sample_weights)
  File "C:\ProgramData\Anaconda2\lib\site-packages\sklearn\metrics\classification.py", line 183, in accuracy_score
    return _weighted_sum(score, sample_weight, normalize)
  File "C:\ProgramData\Anaconda2\lib\site-packages\sklearn\metrics\classification.py", line 108, in _weighted_sum
    return np.average(sample_score, weights=sample_weight)
  File "C:\ProgramData\Anaconda2\lib\site-packages\numpy\lib\function_base.py", line 1124, in average
    "Axis must be specified when shapes of a and weights "
TypeError: Axis must be specified when shapes of a and weights differ.
1
So, do you want to make us guess which line is throwing the error?juanpa.arrivillaga
@ juanpa.arrivillaga The error is related to accuracy_score() function.YNR
Why don't you just post the full error message, and the stack trace?juanpa.arrivillaga
In the error message you are passing accuracy_score(y_test, y_pred, sample_weight=weights_cof) in the code you posted, instead of accuracy_score(y_test, y_pred, sample_weight=sample_weights)juanpa.arrivillaga
@ juanpa.arrivillaga I also replaced the weights_cof with sample_weights to see if the error resolved, but it did not.YNR

1 Answers

0
votes

The error would seem to suggest that the shape of your sample_weights and your y_test/y_pred arrays differ. Basically the method creates a boolean array with y_test == y_pred and passes that along with sample_weights to np.average. One of the first checks in that method is to ensure that the entered array and the weights are the same shape, which apparently in this case they are not.

Update

Your comment "sample_weights, y_test, and y_pred have the same shape (n_samples, n_classes)" exposes the issue. According to the documentation for accuracy_score, y_pred and y_true (in your case y_test and y_pred) should be 1 dimensional. Are you perhaps using one hot encoded labels? If so you should convert them to single value labels and then try the accuracy score again.