1
votes

I am using sklearn.model_selection.GridSearchCV and sklearn.model_selection.cross_val_score, and while doing so I faced an unexpected result.

In my example I use the following imports:

from sklearn.datasets import make_classification
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import LinearSVC
from sklearn.model_selection import cross_val_score
from sklearn.metrics import make_scorer
from sklearn.metrics import recall_score
from sklearn.model_selection import GridSearchCV
import numpy as np

First, I create a random data set:

X, y = make_classification(n_samples=1000, n_features=20, random_state=42)

Next, I define pipeline "generator":

def my_pipeline(C=None):
    if C is None:
        return Pipeline(
            [
                ('step1', StandardScaler()),
                ('clf', LinearSVC(random_state=42))
            ])
    else:
        return Pipeline(
            [
                ('step1', StandardScaler()),
                ('clf', LinearSVC(C=C, random_state=42))
            ])        

Next, I set couple of C's to be tested:

Cs = [0.01, 0.1, 1, 2, 5, 10, 50, 100]

Lastly, I would like to check what is the maximal recall_score that can be obtained. Once, I do it using cross_val_score and once directly using GridSearchCV.

np.max(
    [
        np.mean(
            cross_val_score(my_pipeline(C=c), X, y,
                            cv=3, 
                            scoring=make_scorer(recall_score)
    )) for c in Cs])

and:

GridSearchCV(
    my_pipeline(),
    {
        'clf__C': Cs
    },
    scoring=make_scorer(recall_score),
    cv=3
).fit(X, y).best_score_)

In my example, the former yields 0.85997883750571147 and the latter 0.85999999999999999. I was expecting the value to be the same. What did I miss?

I put it all in a gist as well.

Edit: Fixing cv. I replaced cv=3 with StratifiedKFold(n_splits=3, random_state=42) and the results didn't change. As a matter of fact, it seems like cv doesn't influence the result.

1
A very quick first guess here is that it is something to do with the state of the random number generator at the point that the data are split into folds for cross validation. What happens if you fix the random_state in both GridSearchCV and cross_val_score? - Angus Williams
Good guess, but... wrong. @AngusWilliams check out the update. - Dror

1 Answers

1
votes

To me it looks like a precision issue. If you look at the full list of scores, then for cross_val_score you get the following:

[0.85193468484717316,
 0.85394271697568724,
 0.85995478921674717,
 0.85995478921674717,
 0.8579467570882332,
 0.86195079720077905,
 0.81404660558401265,
 0.82201861337565829]

and for GridSearchCV you get the following

[mean: 0.85200, std: 0.02736, params: {'clf__C': 0.01},
 mean: 0.85400, std: 0.02249, params: {'clf__C': 0.1},
 mean: 0.86000, std: 0.01759, params: {'clf__C': 1},
 mean: 0.86000, std: 0.01759, params: {'clf__C': 2},
 mean: 0.85800, std: 0.02020, params: {'clf__C': 5},
 mean: 0.86200, std: 0.02275, params: {'clf__C': 10},
 mean: 0.81400, std: 0.01916, params: {'clf__C': 50},
 mean: 0.82200, std: 0.02296, params: {'clf__C': 100}]

So each pair of corresponding scores is besically almost the same, up to small precision differences (seems like the scores in GridSearchCV are rounded).