0
votes

In scikit-learn, the C is the inverse of regularization strength (link). I have manually computed three training with the same parameters and conditions except I am using three different C's (i.e. 0.1, 1.0, and 10.0). I compared the F-score in the validation set, and identify the "best" C. However, someone told me this is wrong as I am not supposed to use the validation set to optimize C. How should I pick the right C? And what justification I have if I am to choose the default C (= 1.0) from scikit-learn?

1

1 Answers

1
votes

How should I pick the right C?

You are supposed to have three-folded dataset: training, validation and testing. You train on train, set hyperparameters on validation and finally evaluate on test. In particular, when data is small you can do this with k-fold CV fashion, where you first employ CV for train-test splits, and then yet another one inside, which splits train further to actual train and validation.

And what justification I have if I am to choose the default C (= 1.0) from scikit-learn?

There is no justification besides putting an arbitrary prior on weights (thus any other value would be equally justified).