I am taking a course on Deep Learning in Python and I am stuck on the following lines of an example:
regressor.compile(optimizer = 'adam', loss = 'mean_squared_error')
regressor.fit(X_train, y_train, epochs = 100, batch_size = 32)
From the definitions I know, 1 epoch = going through all training examples once to do one weight update.
batch_size
is used in optimizer that divide the training examples into mini batches. Each mini batch is of size batch_size
.
I am not familiar with adam optimization, but I believe it is a variation of the GD or Mini batch GD. Gradient Descent - has one big batch (all the data), but multiple epochs. Mini Batch Gradient Descent - uses multiple mini batches, but only 1 epoch.
Then, how come the code has both multiple mini batches and multiple epochs? Does epoch in this code has a different meaning then the definition above?