3
votes

Python scikit-learn SGDClassifier() supports both l1, l2, and elastic, it seems to be important to find optimal value of regularization parameter.

I got an advice to use SGDClassifier() with GridSearchCV() to do this, but in SGDClassifier serves only regularization parameter alpha. If I use loss functions such as SVM or LogisticRegression, I think there should be C instead of alpha for parameter optimization. Is there any way to set optimal parameter in SGDClassifier() when using Logisitic Regression or SVM?

In addition, I have one more question about iteration parameter n_iter, but I did not understand what this parameter mean. Does it work like a bagging if used with shuffle option together? So, if I use l1 penalty and large value of n_iter, would it work like RandomizedLasso()?

1

1 Answers

10
votes

C and alpha both have the same effect. The difference is a choice of terminology. C is proportional to 1/alpha. You should use GridSearchCV to select either alpha or C the same way, but remember a higher C is more likely to overfit, where a lower alpha is more likely to overfit.

L2 will produce a model with many small coefficients, where L1 will choose a model with a large number of 0 coefficients and a few large coefficients. Elastic net is a combination of the two.

SGDClassifier uses stochastic gradient descent in which the data is fed through the learning algorithm sample by sample. The n_iter tells it how many passes it should make over the data. As the number of iterations goes up and the learning rate goes down, SGD becomes more like batch gradient descent, but it becomes slower as well.