I am using the Logistic Regression model in Scikit-Learn (in particular, LogisticRegressionCV). When I use the default tol
value (which is 1e-4) and test the model with different random_state
values, the feature coefficients do not fluctuate much. At least, I can see which features are important.
However, when I set a higher tol
value (e.g., 2.3), each time I run the model, the feature coefficients highly fluctuate. When in one trial the feature A has the coefficient of -0.9, in the next run it could have 0.4.
This makes me think that the correct (or favorable) tol
value should be the one when the results are more consistent.
Below is the related part of my code:
classifier = LogisticRegressionCV(penalty='l1', class_weight='balanced',
#tol=2.2,
solver='liblinear')
I wonder if there are guides to determine the appropriate tol
value.
tol
is almost always what you want, otherwise you risk the algorithm not converging, which is what I think you're seeing here. My suggestion would be to leave it as default unless you have some huge complex dataset. – piman314tol
for that reason. Now, I thinktol
has nothing to do with this. But, I do not know. Thanks again! – renakreCs
parameter. I would suggest starting withLogisticRegressionCV(Cs=20, cv=3)
. Also you could think about some feature selection beforehand. – piman314Cs=[1e-10, 1e-8, .., 1e-2]
. No you us L1 during the fitting, I meant something like this – piman314