Choosing a proper tolerance value in Logistic Regression (Scikit-learn)

Question

I am using the Logistic Regression model in Scikit-Learn (in particular, LogisticRegressionCV). When I use the default tol value (which is 1e-4) and test the model with different random_state values, the feature coefficients do not fluctuate much. At least, I can see which features are important.

However, when I set a higher tol value (e.g., 2.3), each time I run the model, the feature coefficients highly fluctuate. When in one trial the feature A has the coefficient of -0.9, in the next run it could have 0.4.

This makes me think that the correct (or favorable) tol value should be the one when the results are more consistent.

Below is the related part of my code:

classifier = LogisticRegressionCV(penalty='l1', class_weight='balanced', 
                                #tol=2.2,
                                solver='liblinear')

I wonder if there are guides to determine the appropriate tol value.

A lower tol is almost always what you want, otherwise you risk the algorithm not converging, which is what I think you're seeing here. My suggestion would be to leave it as default unless you have some huge complex dataset. — piman314
thanks for your comment @ncfirth! I have around 12 features (my sample size is around 400). And, I wanted to penalize some irrelevant features strongly to avoid overfitting. I was increasing tol for that reason. Now, I think tol has nothing to do with this. But, I do not know. Thanks again! — renakre
Ah no I think you want to look at the Cs parameter. I would suggest starting with LogisticRegressionCV(Cs=20, cv=3). Also you could think about some feature selection beforehand. — piman314
Oh thanks a lot for the info! Then, I should use higher Cs values to penalize stronger? I want to do L1 regularization, can I do it beforehand? Maybe, you should put together your comments as a answer. — renakre
I feel like we've gone off topic, so I'll just keep it in the comments, thanks though. A higher integer searches more values, if you want strong regularisation then I'd use something like Cs=[1e-10, 1e-8, .., 1e-2]. No you us L1 during the fitting, I meant something like this — piman314

Viktoriya Malyasova Viktoriya Malyasova · Accepted Answer · 2018-05-30T09:55:23

The tol parameter tells the optimization algorithm when to stop. If the value of tol is too big, the algorithm stops before it can converge. Here is what the docs say:

tol : float
        Stopping criterion. For the newton-cg and lbfgs solvers, the iteration
        will stop when ``max{|g_i | i = 1, ..., n} <= tol``
        where ``g_i`` is the i-th component of the gradient.

It should have a similar meaning for the liblinear solver. If you are interested in the details, the description of the newGLMNET algorithm that the liblinear library uses to solve l1-regularized logistic regression can be found here and here.

Choosing a proper tolerance value in Logistic Regression (Scikit-learn)

1 Answers