python - sklearn GridSearchCV with Pipeline

Question

I'm new to sklearn's Pipeline and GridSearchCV features. I am trying to build a pipeline which first does RandomizedPCA on my training data and then fits a ridge regression model. Here is my code:

pca = RandomizedPCA(1000, whiten=True)
rgn = Ridge()

pca_ridge = Pipeline([('pca', pca),
                      ('ridge', rgn)])

parameters = {'ridge__alpha': 10 ** np.linspace(-5, -2, 3)}

grid_search = GridSearchCV(pca_ridge, parameters, cv=2, n_jobs=1, scoring='mean_squared_error')
grid_search.fit(train_x, train_y[:, 1:])

I know about the RidgeCV function but I want to try out Pipeline and GridSearch CV.

I want the grid search CV to report RMSE error, but this doesn't seem supported in sklearn so I'm making do with MSE. However, the scores it resports are negative:

In [41]: grid_search.grid_scores_
Out[41]: 
[mean: -0.02665, std: 0.00007, params: {'ridge__alpha': 1.0000000000000001e-05},
 mean: -0.02658, std: 0.00009, params: {'ridge__alpha': 0.031622776601683791},
 mean: -0.02626, std: 0.00008, params: {'ridge__alpha': 100.0}]

Obviously this isn't possible for mean squared error - what am I doing wrong here?

Fred Foo Fred Foo · Accepted Answer · 2014-01-11T11:07:16

Those scores are negative MSE scores, i.e. negate them and you get the MSE. The thing is that GridSearchCV, by convention, always tries to maximize its score so loss functions like MSE have to be negated.

python - sklearn GridSearchCV with Pipeline

5 Answers