Keras multi-gpu model fails for a custom model

Question

I have a simple CNN model that I train on ImageNet. I employ keras.utils.multi_gpu_model for multi-GPU training. It works fine, but I am getting problems when trying to train an SSD model based on the same backbone network. It has custom loss and several custom layers on the top of the backbone:

model, predictor_sizes, input_encoder = build_model(input_shape=(args.img_height, args.img_width, 3),                                                                                                                                   
                                                    n_classes=num_classes, mode='training')                                                                                                                                             

optimizer = Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)                                                                                                                                                          
loss = SSDMultiBoxLoss(neg_pos_ratio=3, alpha=1.0)                                                                                                                                                                                      

if args.num_gpus > 1:                                                                                                                                                                                                                   
    model = multi_gpu_model(model, gpus=args.num_gpus)                                                                                                                                                                                  
model.compile(optimizer=optimizer, loss=loss.compute_loss)                                                                                                                                                                              
model.summary()

In case of num_gpus==1 I have the following summary:

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_1 (InputLayer)            (None, 512, 512, 3)  0                                            
__________________________________________________________________________________________________
conv1_pad (Lambda)              (None, 516, 516, 3)  0           input_1[0][0]                    
__________________________________________________________________________________________________
conv1 (Conv2D)                  (None, 256, 256, 16) 1216        conv1_pad[0][0]                  
__________________________________________________________________________________________________
conv1_bn (BatchNormalization)   (None, 256, 256, 16) 64          conv1[0][0]                      
__________________________________________________________________________________________________
conv1_relu (Activation)         (None, 256, 256, 16) 0           conv1_bn[0][0]                   
__________________________________________________________________________________________________

....
                                                                 det_ctx6_2_mbox_loc_reshape[0][0]
__________________________________________________________________________________________________
mbox_priorbox (Concatenate)     (None, None, 8)      0           det_ctx1_2_mbox_priorbox_reshape[
                                                                 det_ctx2_2_mbox_priorbox_reshape[
                                                                 det_ctx3_2_mbox_priorbox_reshape[
                                                                 det_ctx4_2_mbox_priorbox_reshape[
                                                                 det_ctx5_2_mbox_priorbox_reshape[
                                                                 det_ctx6_2_mbox_priorbox_reshape[
__________________________________________________________________________________________________
mbox (Concatenate)              (None, None, 33)     0           mbox_conf_softmax[0][0]          
                                                                 mbox_loc[0][0]                   
                                                                 mbox_priorbox[0][0]              
==================================================================================================
Total params: 1,890,510
Trainable params: 1,888,366
Non-trainable params: 2,144

However, in the multi-GPU case I can see that all the intermediate layers are packed under the model:

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_1 (InputLayer)            (None, 512, 512, 3)  0                                            
__________________________________________________________________________________________________
lambda (Lambda)                 (None, 512, 512, 3)  0           input_1[0][0]                    
__________________________________________________________________________________________________
lambda_1 (Lambda)               (None, 512, 512, 3)  0           input_1[0][0]                    
__________________________________________________________________________________________________
model (Model)                   (None, None, 33)     1890510     lambda[0][0]                     
                                                                 lambda_1[0][0]                   
__________________________________________________________________________________________________
mbox (Concatenate)              (None, None, 33)     0           model[1][0]                      
                                                                 model[2][0]                      
==================================================================================================
Total params: 1,890,510
Trainable params: 1,888,366
Non-trainable params: 2,144

The training runs ok, but I cannot load previously pre-trained weights:

model.load_weights(args.weights, by_name=True)

because of the error:

ValueError: Layer #3 (named "model") expects 150 weight(s), but the saved weights have 68 element(s).

Surely, pre-trained model has only weights for the backbone, not for the rest of the object detection model.

Can anybody help me in understanding:

Why all the intermediate layers are packed into the Lambda layer?
Why this does not happen for a classification model
How can I either overcome the "model packing" or load the pre-trained weights for this kind of model?

NB: I am using the tf.Keras, which is a part of Tensorflow now.

Hi, I also have the same issue with a custom model with keras multi gpu model. Did you find a convenient way to get over this? — Jayant Agrawal
I did not. The answer below is not very clear for me. Does it work for you? If yes, could you please clarify and explain? Thanks ! — Jayant Agrawal
First, build the model: model, predictor_sizes, input_encoder = build_model(...). Then, load the weights: model.load_weights(...). Only after that, make it distributed: model = multi_gpu_model(model, gpus=args.num_gpus). — Dmytro Prylipko
So, I found a solution: the best way to do this is not to save the parallel model but to save the original model (used to create the parallel model). This is described here: textpert.ai/aime-blog/… — Jayant Agrawal

Lena Muzyka Lena Muzyka · Accepted Answer · 2019-01-19T17:56:53

You can load weights right after building it, before converting into the multi-gpu counterpart. Alternatively, you could have two objects for the single-gpu and the multi-gpu versions, and use the first one to load the weights, and the second one to train.

Keras multi-gpu model fails for a custom model

2 Answers