3
votes

I am using python scikit-learn library for classification.

As a feature selection step, I want to use RandomizedLogisticRegression().

So for finding best value of C by cross-validation, I used LogisticRegressionCV(penalty='l1', solver='liblinear'). However, all coefficients were all 0 in this case. Using l2 penalty works without problem. Also, single run of LogisticRegression() with l1 penalty seems to give proper coeffients.

I am using RandomizedLasso and LassoCV() for work-around, but I am not sure whether it is proper to use LASSO for binary class label.

So my question is like these.

  1. Is there some problem in using LogisticRegressionCV() in my case?
  2. Is there another way to find best value of C_ for logistic regression except GridSearchCV()?
  3. Is it possible to use LASSO for binary(not continuous) classification?
1
1) All coefficients being zero means that the strength of your L1 prior is too strong. 2) You do not have to crossvalidate, but you do need an independent test set for adjusting the strength of the L1 penalty. Crossvalidation is almost always superior, so why not simply use it? 3) Logistic regression with a L1 prior on the weights is exactly that. - cel
@cel Thank you very much for your comment. But I am sorry I could not understand some points, because I am not good at statistics. In question 2, I think using single LogisticRegression on separate test set may cause overfitting problem, but is it enough? And in 3, L1 penalised logistic regression and Lasso regression seem to show much difference, at least in scikit-learn. I thought that is because Lasso regression in scikit-learn serves binary class 0 and 1 as continuous value. Is there some way to deal with this problem? - z991
For using machine learning having a deep understanding in statistics is certainly helpful, but not strictly necessary. But you do have to have a deep understanding of training/testing and how and why crossvalidation works. Without that knowledge you will make a lot of mistakes. So if you don't have that yet, go and read, read, read. In general: Whenever you tune parameters, you always have to add an additional layer of independent evaluation otherwise you can run into overfitting. I don't understand your problems with the difference of lasso and logistic regression with L1 penalty. - cel
You will not find any resources that will magically make everything simple. Formulas will get simpler once you understood the core principles. Reading wikipedia is always a good start. Try to grasp the concepts, not the formulas. Developing a deep understanding requires you not only to understand the formulas, but also to get an intuition what is behind these formulas. This is difficult even for people from more technical fields. - cel
@cel Thank you for kind advice. I'd better start step by step, with some simple and small dataset. - z991

1 Answers

3
votes

From what you describe, I can say that the coefficient of the l1 regularisation term is high in your case which you need to decrease.

When the coefficient is very high, the regularisation terms becomes more important than the error term and so your model just becomes very sparse and doesn't predict anything.

I checked the LogisticRegressionCV and it says that it will search from 1e-4 to 1e4 using the Cs argument. The documentation says that in order to have lower regularisation coefficients you need to have higher Cs if you provide an integer. Alternatively you can possibly provide the inverse of regularisation coefficients yourself as a list.

So play with the Cs parameter and try to lower the regularisation coefficient.