3
votes

I am trying to do multi-label classification using sklearn's cross_val_score function (http://scikit-learn.org/stable/modules/cross_validation.html).

scores = cross_validation.cross_val_score(clf, X_train, y_train,
        cv = 10, scoring = make_scorer(f1_score, average = None))

I want the F1-score for each label returned. This sort of works for the first fold, but gives an error right after:

ValueError: scoring must return a number, got [ 0.55555556  0.81038961  0.82474227  0.67153285  0.76494024  0.89087657 0.93502377  0.11764706  0.81611208] (<type 'numpy.ndarray'>)

I assume this error is raised because cross_val_score expects a number to be returned. Is there any other way I can use cross_val_score to get the F1-score per label?

2

2 Answers

1
votes

I solved the problem by making some changes in .../scikit-learn/sklearn/cross_validation.py. More specifically I commented out these lines:

1651     if not isinstance(score, numbers.Number):
1652         raise ValueError("scoring must return a number, got %s (%s) instead."
1653                          % (str(score), type(score)))

This eliminates the check whether the type is a number, thus allowing a numpy array to be passed.

-1
votes

Seems this question is bit old. But this will be useful for anyone who is looking for a similar requirement with multiclasses. With the latest doc in scikit learn 0.23.1 and above; you can pass your own dictionary with metric functions;

custom_scorer = {'accuracy': make_scorer(accuracy_score),
                 'balanced_accuracy': make_scorer(balanced_accuracy_score),
                 'precision': make_scorer(precision_score, average='macro'),
                 'recall': make_scorer(recall_score, average='macro'),
                 'f1': make_scorer(f1_score, average='macro'),
                 }
scores = cross_validation.cross_val_score(clf, X_train, y_train,
        cv = 10, scoring = custom_scorer)