I have a simple CNN model that I train on ImageNet. I employ keras.utils.multi_gpu_model for multi-GPU training. It works fine, but I am getting problems when trying to train an SSD model based on the same backbone network. It has custom loss and several custom layers on the top of the backbone:
model, predictor_sizes, input_encoder = build_model(input_shape=(args.img_height, args.img_width, 3),
n_classes=num_classes, mode='training')
optimizer = Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)
loss = SSDMultiBoxLoss(neg_pos_ratio=3, alpha=1.0)
if args.num_gpus > 1:
model = multi_gpu_model(model, gpus=args.num_gpus)
model.compile(optimizer=optimizer, loss=loss.compute_loss)
model.summary()
In case of num_gpus==1
I have the following summary:
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) (None, 512, 512, 3) 0
__________________________________________________________________________________________________
conv1_pad (Lambda) (None, 516, 516, 3) 0 input_1[0][0]
__________________________________________________________________________________________________
conv1 (Conv2D) (None, 256, 256, 16) 1216 conv1_pad[0][0]
__________________________________________________________________________________________________
conv1_bn (BatchNormalization) (None, 256, 256, 16) 64 conv1[0][0]
__________________________________________________________________________________________________
conv1_relu (Activation) (None, 256, 256, 16) 0 conv1_bn[0][0]
__________________________________________________________________________________________________
....
det_ctx6_2_mbox_loc_reshape[0][0]
__________________________________________________________________________________________________
mbox_priorbox (Concatenate) (None, None, 8) 0 det_ctx1_2_mbox_priorbox_reshape[
det_ctx2_2_mbox_priorbox_reshape[
det_ctx3_2_mbox_priorbox_reshape[
det_ctx4_2_mbox_priorbox_reshape[
det_ctx5_2_mbox_priorbox_reshape[
det_ctx6_2_mbox_priorbox_reshape[
__________________________________________________________________________________________________
mbox (Concatenate) (None, None, 33) 0 mbox_conf_softmax[0][0]
mbox_loc[0][0]
mbox_priorbox[0][0]
==================================================================================================
Total params: 1,890,510
Trainable params: 1,888,366
Non-trainable params: 2,144
However, in the multi-GPU case I can see that all the intermediate layers are packed under the model
:
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) (None, 512, 512, 3) 0
__________________________________________________________________________________________________
lambda (Lambda) (None, 512, 512, 3) 0 input_1[0][0]
__________________________________________________________________________________________________
lambda_1 (Lambda) (None, 512, 512, 3) 0 input_1[0][0]
__________________________________________________________________________________________________
model (Model) (None, None, 33) 1890510 lambda[0][0]
lambda_1[0][0]
__________________________________________________________________________________________________
mbox (Concatenate) (None, None, 33) 0 model[1][0]
model[2][0]
==================================================================================================
Total params: 1,890,510
Trainable params: 1,888,366
Non-trainable params: 2,144
The training runs ok, but I cannot load previously pre-trained weights:
model.load_weights(args.weights, by_name=True)
because of the error:
ValueError: Layer #3 (named "model") expects 150 weight(s), but the saved weights have 68 element(s).
Surely, pre-trained model has only weights for the backbone, not for the rest of the object detection model.
Can anybody help me in understanding:
- Why all the intermediate layers are packed into the Lambda layer?
- Why this does not happen for a classification model
- How can I either overcome the "model packing" or load the pre-trained weights for this kind of model?
NB: I am using the tf.Keras, which is a part of Tensorflow now.
model, predictor_sizes, input_encoder = build_model(...)
. Then, load the weights:model.load_weights(...)
. Only after that, make it distributed:model = multi_gpu_model(model, gpus=args.num_gpus)
. – Dmytro Prylipko