Use Scikit-Learn's GridSearchCV to capture precision, recall, and f1 for all permutations?

Question

I want to use Scikit-Learn's GridSearchCV to run a bunch of experiments and then print out the recall, precision, and f1 of each experiment.

This article (https://scikit-learn.org/stable/auto_examples/model_selection/plot_grid_search_digits.html) suggests that I need to run .fit and .predict multiple times.

...
scores = ['precision', 'recall']
...
for score in scores:
    ...
    clf = GridSearchCV(
        SVC(), tuned_parameters, scoring='%s_macro' % score
    )
    clf.fit(X_train, y_train) # running for each scoring metric
    ...
    for mean, std, params in zip(means, stds, clf.cv_results_['params']):
        print("%0.3f (+/-%0.03f) for %r"
              % (mean, std * 2, params))
    ...
    y_true, y_pred = y_test, clf.predict(X_test) # running for each scoring metric
    print(classification_report(y_true, y_pred))

I would like to just run .fit once and log all of the recall, precision, and f1 metrics. So for example, something along the lines of:

clf = GridSearchCV(
    SVC(), tuned_parameters, scoring=['recall', 'precision', 'f1'] # I don't think this syntax is even possible
)

clf.fit(X_train, y_train)

for metric in clf.something_that_i_cannot_find:
    ### does something like this exist?
    print(metric['precision']
    print(metric['recall'])
    print(metric['f1'])
    ###:end does something like this exist?

Or maybe even:

...
for run in clf.something_that_i_cannot_find:
    ### does something like this exist?
    print(classification_report(run.y_true, run.y_pred))
    ###:end does something like this exist?

This article (Scoring in Gridsearch CV) suggests that GridSearchCV can be made aware of multiple scorers, but I still can't figure out how to access each of those scores for all of the experiments.

Is what I'm looking not supported by GridSearchCV? Is the method used in the article (i.e. running the .fit and .predict multiple times) the easiest way to accomplish something similar to what I'm asking for?

Thank you for your time ????

You are gonna have to do it manually which would take a lot of code using folds from scikit learn and loop over the parameters, I would suggest to set the random state and run the grid search 3 times. — Ibrahim Sherif
Thank you for the suggestion. I'll take that approach. If you want to type up your comment as an answer, I'll accept it to close the loop on this. — Zhao Li

Priya Priya · Accepted Answer · 2021-08-28T02:41:12

You can do the multiple-metric evaluation on binary classification. I encountered a ValueError: Multi-class not supported, when I was trying to implement on iris dataset.

I have implemented on basic binary data below, where I am calculating four different scores,

['AUC', 'F1', 'Precision', 'Recall']

Note: The idea is not to consume inference from the model but only to show how multiple-metric evaluation works. The data is just random data.

X, y = datasets.make_classification(n_classes=2, random_state=0)

# The scorers can be either one of the predefined metric strings or a scorer
# callable, like the one returned by make_scorer
f1_scorer = make_scorer(f1_score, average='binary')
scoring = {'AUC': 'roc_auc', 'F1': 'f1_micro', 'Precision': 'precision', 'Recall':'recall'}

# split data to train and test data
X_train, X_test, y_train, y_test =  train_test_split(X, y, test_size=0.2)

clf = GridSearchCV(
              SVC(),
              param_grid={'kernel': ['linear'], 'C': [1, 10, 100, 1000]},
              scoring=scoring,
              refit='AUC',
              return_train_score=True
               )
clf.fit(X_train, y_train)
results = clf.cv_results_


**Plotting the result**

plt.figure(figsize=(10, 10))
plt.title("GridSearchCV evaluating using multiple scorers simultaneously",
      fontsize=16)

plt.xlabel("min_samples_split")
plt.ylabel("Score")

ax = plt.gca()
ax.set_xlim(1, 1000)
ax.set_ylim(0.40, 1)

# Get the regular numpy array from the MaskedArray
X_axis = np.array(results['param_C'].data, dtype=float)

for scorer, color in zip(sorted(scoring), ['g', 'k', 'b', 'r']):
    for sample, style in (('train', '--'), ('test', '-')):
       sample_score_mean = results['mean_%s_%s' % (sample, scorer)]
       sample_score_std = results['std_%s_%s' % (sample, scorer)]
       ax.fill_between(X_axis, sample_score_mean - sample_score_std,
                    sample_score_mean + sample_score_std,
                    alpha=0.1 if sample == 'test' else 0, color=color)
       ax.plot(X_axis, sample_score_mean, style, color=color,
            alpha=1 if sample == 'test' else 0.7,
            label="%s (%s)" % (scorer, sample))

    best_index = np.nonzero(results['rank_test_%s' % scorer] == 1)[0][0]
    best_score = results['mean_test_%s' % scorer][best_index]

    # Plot a dotted vertical line at the best score for that scorer marked by x
    ax.plot([X_axis[best_index], ] * 2, [0, best_score],
        linestyle='-.', color=color, marker='x', markeredgewidth=3, ms=8)

    # Annotate the best score for that scorer
    ax.annotate("%0.2f" % best_score,
            (X_axis[best_index], best_score + 0.005))

plt.legend(loc="best")
plt.grid(False)
plt.show()

Output plot

Use Scikit-Learn's GridSearchCV to capture precision, recall, and f1 for all permutations?

2 Answers