Python scikit-learn (using grid_search.GridSearchCV)

Question

I'm using grid search to fit machine learning model parameters.

I typed in the following code (modified from the sklearn documentation page: http://scikit-learn.org/stable/modules/generated/sklearn.grid_search.GridSearchCV.html)

from sklearn import svm, grid_search, datasets, cross_validation

# getting data
iris = datasets.load_iris()

# grid of parameters
parameters = {'kernel':('linear', 'poly'), 'C':[1, 10]}

# predictive model (support vector machine)
svr = svm.SVC()

# cross validation procedure
mycv = cross_validation.StratifiedKFold(iris.target, n_folds = 2)

# grid search engine
clf = grid_search.GridSearchCV(svr, parameters, mycv)

# fitting engine
clf.fit(iris.data, iris.target)

However, when I look at clf.estimator, I get the following:

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, degree=3, gamma=0.0,
  kernel='rbf', max_iter=-1, probability=False, random_state=None,
  shrinking=True, tol=0.001, verbose=False)

How did I end up with a 'rbf' kernel? I didn't specify it as an option in my parameters.

What's going on?

Thanks!

P.S. I'm using '0.15-git' version for sklearn.

Addendum: I noticed that clf.best_estimator_ gives the right output. So what is clf.estimator doing?

I believe in your parameters dictionary the kernel key should have a list as its values. i.e. ['linear', 'poly'] (square brackets). rbf just showed up because it is the default. — gobrewers14
Thanks. So clf.estimator doesn't really do anything? It's more like a placeholder for default values? — monkeybiz7
estimator is an object of the GridSearchCV class. If you create an instance of this class, i.e. clf, .estimator will return the object and in this case, since your initial code was erroneous, it returned the default. — gobrewers14
Got it! Thanks! Although fixing the code to 'kernel':['linear', 'poly'] still returns kernel='rbf' for the clf.estimator attribute. — monkeybiz7

DavidS DavidS · Accepted Answer · 2014-06-06T22:43:19

clf.estimator is simply a copy of the estimator passed as the first argument to the GridSearchCV object. Any parameters not grid searched over are determined by this estimator. Since you did not explicitly set any parameters for the SVC object svr, it was given all default values. Therefore, because clf.estimator is just a copy of svr, printing the value of clf.estimator returns an SVC object with default parameters. Had you instead written, e.g.,

svr = svm.SVC(C=4.3)

then the value of clf.estimator would have been:

SVC(C=4.3, cache_size=200, class_weight=None, coef0=0.0, degree=3, gamma=0.0,
  kernel='rbf', max_iter=-1, probability=False, random_state=None,
  shrinking=True, tol=0.001, verbose=False)

There is no real value to the user in accessing clf.estimator, but then again it wasn't meant to be a public attribute anyways (since it doesn't end with a "_").

Python scikit-learn (using grid_search.GridSearchCV)

1 Answers