1
votes

I have a Keras model that with layers Embedding, LSTM, and Dropout, as well as the CRF implementation of keras_contrib.

I was trying to resume the training of a partly-trained model weights of which I had previously saved. However, when I tried loading a previously trained model via save_load_utils.load_all_weights of keras_contrib, I received the following error.

line 108, in load_all_weights:

model.optimizer.set_weights(optimizer_weight_values)

line 113, in set_weights:

'of the optimizer (' + str(len(params)) + ')')

ValueError: Length of the specified weight list (36) does not match the number of weights of the optimizer (0)model.optimizer.set_weights(optimizer_weight_values)

Apparently, the list of optimizer weights have the length 0. In keras implementation of the optimizers.py it is stated that set_weights "should only be called after computing the gradients, (otherwise the optimizer has no weights)."

I was wondering how to somehow manually initialize the optimizer weights so that the model weights I am trying to load can overwrite them. I thought of training the model for a single epoch with a dummy batch of size 1, but are there any other, more elegant ways to achieve this?

The entire code is on Github, but below is the model I trained, to provide a brief reference.

# Initialize vocab_size & embedding_weights
# Initialize C, U, N, M, H

model = Sequential()
embedding_layer = Embedding(vocab_size, N,
                            weights=[embedding_weights], mask_zero=True,
                            embeddings_regularizer=regularizers.l2(0.0001))
model.add(TimeDistributed(embedding_layer,
                          input_shape=(C, U)))
model.add(TimeDistributed(Bidirectional(LSTM(M // 2, return_sequences=True,
                                             kernel_regularizer=regularizers.l2(0.0001)))))
model.add(TimeDistributed(Dropout(0.2)))
model.add(TimeDistributed(GlobalMaxPooling1D()))
model.add(Bidirectional(LSTM(H // 2, return_sequences = True,
                             kernel_regularizer=regularizers.l2(0.0001))))
model.add(Dropout(0.2))
crf = CRF(num_tags, sparse_target=False, kernel_regularizer=regularizers.l2(0.0001))
model.add(crf)
model.compile(optimizer, loss = crf.loss_function, metrics=[crf.accuracy])
1

1 Answers

1
votes

What I ended up doing is almost exactly what I mentioned in my question.

I created a small dummy training & validation set, and trained the model for a single epoch so that the network weights are initialized. Then I just loaded the weights from the previous session and continued the training, using load_all_weights from keras_contrib.utils.save_load_utils. The code sample below roughly depicts the procedure I used.

# Initialize real_training_set as a 2-tuple with (input, expected_result)
if load_model_file is not None:
    # Initialize dummy_training_set as a 2-tuple with (input, expected_result)
    model.fit_generator(batch_generator_function(dummy_training_set[0],
                                         dummy_training_set[1], ... ), epochs = 1)
    save_load_utils.load_all_weights(model, load_from_model_file)
model.fit_generator(batch_generator_function(real_training_set[0],
                                             real_training_set[1], ... ), epochs = 1)

You may view the actual code on Github.