4
votes

I am using sklearn Randomized Regression, such as Randomized Logistic Regression. Because randomized logistic regression uses L1-penalty, it is require to set the regularization parameter C(or alpha in Lasso).

To find good value for C, I usually used simple GridSearchCV like below.

But RandomizedLogisticRegression() does not support GridSearchCV, because it contains the bootstrapping. Instead, I tried to use typical LogisticRegression with GridSearchCV.

params = {'C':[0.1, 1, 10]}
logi = LogisticRegression(penalty='l1')
clf = GridSearchCV(logi, params, cv=10)

I could get C by this way, however, no attribute was selected when I apply this C value to Randomized logistic regression. Maybe the selected C by GridSearchCV was quite low.

So, I would like to know that whether there are any other good way for determining the fair value of C(or alpha), when using Randomized regression.

There was a similar question before, but I think that answer was for typical regression.

Can anyone give me an idea please?

1
What about cross-validation?Riyaz
Unfortunately, using LogisticRegressionCV() produced the similar result as GridSearchCV(). The best C value was too small, and the coefficients of each features were all 0.ToBeSpecific

1 Answers

4
votes

Because RandomizedLogisticRegression is used for feature selection, it would need to be cross validated as part of a pipeline. You can apply GridSearchCV to a Pipeline which contains it as a feature selection step along with your classifier of choice. An example might look like:

pipeline = Pipeline([
  ('fs', RandomizedLogisticRegression()),
  ('clf', LogisticRegression())
])

params = {'fs__C':[0.1, 1, 10]}

grid_search = GridSearchCV(pipeline, params)