I am new to machnine learning. I've been following some online tutorials, where they fit a logistic regression to MNIST data by using a python scikit library. The default 'liblinear' solver is shown to perform slowly on the training set size of 60 000 images, hence the tutorial suggests using the 'lbfgs' solver.
However, the user guide suggests, that this solver is suitable only for small datasets:
The “lbfgs” solver is recommended for use for small data-sets but for larger datasets its performance suffers. [9]
While I am familiar with statistics, where a small data set is usually <100, how do I justify the choice of this solver here and also how do I relate to a sample size in this case? Should that simply be based on intuition/performance or are there some strict criteria?