1
votes

I am new to machnine learning. I've been following some online tutorials, where they fit a logistic regression to MNIST data by using a python scikit library. The default 'liblinear' solver is shown to perform slowly on the training set size of 60 000 images, hence the tutorial suggests using the 'lbfgs' solver.

However, the user guide suggests, that this solver is suitable only for small datasets:

The “lbfgs” solver is recommended for use for small data-sets but for larger datasets its performance suffers. [9]

While I am familiar with statistics, where a small data set is usually <100, how do I justify the choice of this solver here and also how do I relate to a sample size in this case? Should that simply be based on intuition/performance or are there some strict criteria?

1
You know there is time to compute performance and there is accuracy on the test set performance. I believe you are mixing up the two of them.branco
Choice of solver is affected by (i) computation time and (ii) [possibly] performance metrics. To satisfy the former, you choose faster algo. The latter is subject to cross-validation. There is no other magic than that.Sergey Bushmanov
Skipping the cross-validation part, indeed the computation time is affected by the choice of the solver. In the quote above it says that the "lbfgs" solver is recommended for use for small datasets. So what is then a small dataset? I could theoretically try out all the available solvers and see which one is faster, but if I am given an instruction already that "lbfgs" is good for a smaller dataset, I would want to follow/translate this instruction to a real situation, where I have 60000 images. Do I consider them as a small dataset?Darya Shcherbakova

1 Answers

2
votes

Its not about solver to be used. Using Logistic Regression for MNIST data gives some lower results. Because it just draws a boundary line between two categories. Whereas if you use Neural Networks, Convolutional Neural Networks, SVM with any kernel other than 'Linear' then they will give optimum results provided parameters are in best fit way.

Solver definitely wastes your time. But its better suggested use above models.