Interpretation of learning curve in ScikitLearn concerning epochs

Question

I am new to Machine Learning and I am currently using ScikitLearn's MLPClassifier for a Neural Network task. According to Andrew Ng's famous machine learning course, I am plotting the learning curve, in my case by using ScikitLearn's function learning_curve (see also documentation: https://scikit-learn.org/stable/auto_examples/model_selection/plot_learning_curve.html):

clf = MLPClassifier(solver='adam', activation='relu', alpha=0.001,
 learning_rate='constant',learning_rate_init=0.0001,
 hidden_layer_sizes=[39, 37, 31, 34],   batch_size=200,
 max_iter=1000, verbose=True) 


cv=GroupKFold(n_splits=8)

estimator =clf
ylim=(0.7, 1.01)
cv=cv
n_jobs=1
train_sizes=np.linspace(.01, 1.0, 100)


#Calculate learning curve
train_sizes, train_scores, test_scores = learning_curve(
    estimator, X_array_train, Y_array_train,
    groups=groups_array_train, cv=cv, n_jobs=n_jobs,
    train_sizes=train_sizes, scoring='accuracy',verbose=10)

My solver for the MLPClassifier is 'adam' and the batch size is 200.

This is the resulting plot: https://i.imgur.com/jDNoEVg.png

I have two questions concerning the interpretation of such learning curves:

1.) As I understand this learning curve, it gives me the training and crossvalidation score for different amount of training data till the end of one epoch (epoch=one forward pass and one backward pass of all the training examples). Looking at the "gap" betweeen these two and at which score they end up I can diagnose, if I have a high bias or variance problem. However, according to the verbose of my MLPClassifier, the neural network is training over several epochs, so which epoch is given in the curve (first epoch of training, last epoch or average scores over all epochs?). Or is there a misunderstanding from my side with epochs at all?

2.) Starting a new batch (after 200 and 400 training examples), I get spikes. What would be a correct way to interpret them?

3.) Probably understanding 1.) will also answer this: What is making this function so slow, that you need several parallel jobs n_jobs to get it done in a reasonable time? clf.fit(X,y) is fast in my case.

I would be really grateful, if someone could help me to get a better understanding of this. I am also open for literature recommendations.

Many thanks in advance!

Just edited my code :) I think my problem is, that I have to understand more detailed, how adam uses the epochs. See also comment below. — S.Maria

Jon Nordby Jon Nordby · Accepted Answer · 2019-01-23T13:17:11

A learning curve should only be computed on a stable,generalizable model. Did you ensure that the model is not overfitting?

1) The estimator is trained to completion, ie to the final epoch or any early stopping threshold). How many this is depends on your estimator configuration. In fact the learning_curve function does not have a concept of epochs at all. It can just as well be applied to classifiers which don't use epochs.

2) Your batch size is very large compared to the number of total samples. Consider a smaller batch size, maybe 50 or 20. SPECULATION: It might be that for 201 samples, you end up with one batch of 200 and one batch of 1. That batch of 1 might cause problems.

3) The learning curve will train for each cross-validation fold for each training sample selection. In your case it looks like your are testing all 500 possible training sizes. With 5 fold CV, that will be 2500 training rounds. Without parallelization this takes 2500 times that of one fit()+predict(). Instead your should only sample some training set size. train_sizes = numpy.linspace(0.0, 1.0, 30) for 30 points between 0% of your data and 100%.

Interpretation of learning curve in ScikitLearn concerning epochs

1 Answers