The fix is to ensure batch size never changes between batches. They must all be the same size.
Method 1
One way is to use a batch size that perfectly divides your dataset into equal-sized batches. For example, if total size of data is 1500 examples, then use a batch size of 50 or 100 or some other proper divisor of 1500.
batch_size = len(data)/proper_divisor
Method 2
The other way is to ignore any batch that is less than the specified size, and this can be done using the TensorFlow Dataset API and setting the drop_remainder
to True
.
batch_size = 64
train_data = tf.data.Dataset.from_tensor_slices((train_feature, train_label))
train_data = train_data.repeat().batch(batch_size, drop_remainder=True)
steps_per_epoch = len(train_feature) // batch_size
model.fit(train_data,
epochs=10, steps_per_epoch = steps_per_epoch)
When using the Dataset API like above, you will need to also specify how many rounds of training count as an epoch (essentially how many batches to count as 1 epoch). A tf.data.Dataset
instance (the result from tf.data.Dataset.from_tensor_slices
) doesn't know the size of the data that it's streaming to the model, so what constitutes as one epoch has to be manually specified with steps_per_epoch
.
Your new code will look like this:
train_feature = train_feature.reshape((train_feature.shape[0], 1, train_feature.shape[1]))
val_feature = val_feature.reshape((val_feature.shape[0], 1, val_feature.shape[1]))
batch_size = 64
train_data = tf.data.Dataset.from_tensor_slices((train_feature, train_label))
train_data = train_data.repeat().batch(batch_size, drop_remainder=True)
model = tf.keras.Sequential()
model.add(tf.keras.layers.LSTM(50, batch_input_shape=(batch_size, train_feature.shape[1], train_feature.shape[2]), stateful=True))
model.add(tf.keras.layers.Dense(1))
model.compile(optimizer='adam',
loss='mse',
metrics=[tf.keras.metrics.RootMeanSquaredError()])
steps_per_epoch = len(train_feature) // batch_size
model.fit(train_data,
epochs=10, steps_per_epoch = steps_per_epoch)
You can also include the validation set as well, like this (not showing other code):
batch_size = 64
val_data = tf.data.Dataset.from_tensor_slices((val_feature, val_label))
val_data = val_data.repeat().batch(batch_size, drop_remainder=True)
validation_steps = len(val_feature) // batch_size
model.fit(train_data, epochs=10,
steps_per_epoch=steps_per_epoch,
validation_steps=validation_steps)
Caveat: This means a few datapoints will never be seen by the model. To get around that, you can shuffle the dataset each round of training, so that the datapoints left behind each epoch changes, giving everyone a chance to be seen by the model.
buffer_size = 1000 # the bigger the slower but more effective shuffling.
train_data = tf.data.Dataset.from_tensor_slices((train_feature, train_label))
train_data = train_data.shuffle(buffer_size=buffer_size, reshuffle_each_iteration=True)
train_data = train_data.repeat().batch(batch_size, drop_remainder=True)
Why the error occurs
Stateful RNNs and their variants (LSTM, GRU, etc.) require fixed batch size. The reason is simply because statefulness is one way to realize Truncated Backprop Through Time, by passing the final hidden state for a batch as the initial hidden state of the next batch. The final hidden state for the first batch has to have exactly the same shape as the initial hidden state of the next batch, which requires that batch size stay the same across batches.
When you set the batch size to 64, model.fit
will use the remaining data at the end of an epoch as a batch, and this may not have up to 64 datapoints. So, you get such an error because the batch size is different from what the stateful LSTM expects. You don't have the problem with batch size of 1 because any remaining data at the end of an epoch will always contain exactly 1 datapoint, so no errors. More generally, 1 is always a divisor of any integer. So, if you picked any other divisor of your data size, you should not get the error.
In the error message you posted, it appears the last batch has size of 49 instead of 64. On a side note: The reason the shapes look different from the input is because, under the hood, keras works with the tensors in time_major (i.e. the first axis is for steps of sequence). When you pass a tensor of shape (10, 15, 2) that represents (batch_size, steps_per_sequence, num_features), keras reshapes it to (15, 10, 2) under the hood.