I'm trying to train my sequential model (RNN->GRU->Dense) with Keras/TensorFlow 2.0 in two phases with different loss weights in the two phases. To change the loss weights, I need to recompile the model between the two phases. My problem is that training becomes much much slower after the recompilation, and I can see no other explanation than that the GPU is no longer used. Here is the relevant code:
# Build model
input_ = tf.keras.layers.Input(shape=(None, num_features))
masking = tf.keras.layers.Masking(mask_value=0.)(input_)
rnn = tf.keras.layers.SimpleRNN(24, return_sequences=True, name="rnn")(masking)
gru = tf.keras.layers.GRU(16, return_sequences=True, name="gru")(rnn)
dense1 = tf.keras.layers.Dense(5, activation=tf.nn.softmax, name="dense1")(gru)
dense2 = tf.keras.layers.Dense(1, activation=tf.math.sigmoid, name="dense2")(gru)
model = tf.keras.Model(inputs=[input_], outputs=[dense1, dense2])
# Learn reate scheduler: Reduce learn reate by factor 0.5 when no progress after 7 epochs
reduce_lr = tf.keras.callbacks.ReduceLROnPlateau(monitor='loss', factor=0.5, patience=7, min_lr=0.0001)
# Compile and fit, phase 1
optimizer = tf.keras.optimizers.Adam(lr=0.01, clipvalue=0.1)
model.compile(optimizer=optimizer, loss=['categorical_crossentropy', 'binary_crossentropy'], sample_weight_mode="temporal", loss_weights=[0.7, 0.3], metrics=['accuracy'])
model.fit_generator(train_generator(), steps_per_epoch=BATCHES_PER_EPOCH, epochs=375, callbacks=[reduce_lr])
# Recompile and fit, phase 2
optimizer.lr = 0.001
model.compile(optimizer=optimizer, loss=['categorical_crossentropy', 'binary_crossentropy'], sample_weight_mode="temporal", loss_weights=[0.99, 0.01], metrics=['accuracy'])
model.fit_generator(train_generator(), steps_per_epoch=BATCHES_PER_EPOCH, epochs=125, callbacks=[reduce_lr])
Output at end of phase 1 and start of phase 2 shows how training becomes about 5 times slower:
Epoch 374/375
4/4 [==============================] - 5s 1s/step - loss: 0.1177 - dense1_loss: 0.1479 - dense2_loss: 0.0473 - dense1_accuracy: 0.9249 - dense2_accuracy: 0.9784
Epoch 375/375
4/4 [==============================] - 5s 1s/step - loss: 0.1177 - dense1_loss: 0.1479 - dense2_loss: 0.0473 - dense1_accuracy: 0.9249 - dense2_accuracy: 0.9784
Epoch 1/125
4/4 [==============================] - 27s 7s/step - loss: 0.1494 - dense1_loss: 0.1504 - dense2_loss: 0.0478 - dense1_accuracy: 0.9225 - dense2_accuracy: 0.9779
Epoch 2/125
4/4 [==============================] - 24s 6s/step - loss: 0.1603 - dense1_loss: 0.1614 - dense2_loss: 0.0545 - dense1_accuracy: 0.9201 - dense2_accuracy: 0.9750
What could be the explanation? Is the model reorganized in some way when it's recompiled, so TensorFlow can no longer map the operations to the GPU?
(I have tried just changing the loss weights with model.loss_weights = [0.99, 0.01] but that doesn't work - recompilation is necessary.)