
I have a simple CNN model that I train on ImageNet. I employ keras.utils.multi_gpu_model for multi-GPU training. It works fine, but I am getting problems when trying to train an SSD model based on the same backbone network. It has custom loss and several custom layers on the top of the backbone:

model, predictor_sizes, input_encoder = build_model(input_shape=(args.img_height, args.img_width, 3),                                                                                                                                   
                                                    n_classes=num_classes, mode='training')                                                                                                                                             

optimizer = Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)                                                                                                                                                          
loss = SSDMultiBoxLoss(neg_pos_ratio=3, alpha=1.0)                                                                                                                                                                                      

if args.num_gpus > 1:                                                                                                                                                                                                                   
    model = multi_gpu_model(model, gpus=args.num_gpus)                                                                                                                                                                                  
model.compile(optimizer=optimizer, loss=loss.compute_loss)                                                                                                                                                                              

In case of num_gpus==1 I have the following summary:

Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            (None, 512, 512, 3)  0                                            
conv1_pad (Lambda)              (None, 516, 516, 3)  0           input_1[0][0]                    
conv1 (Conv2D)                  (None, 256, 256, 16) 1216        conv1_pad[0][0]                  
conv1_bn (BatchNormalization)   (None, 256, 256, 16) 64          conv1[0][0]                      
conv1_relu (Activation)         (None, 256, 256, 16) 0           conv1_bn[0][0]                   

mbox_priorbox (Concatenate)     (None, None, 8)      0           det_ctx1_2_mbox_priorbox_reshape[
mbox (Concatenate)              (None, None, 33)     0           mbox_conf_softmax[0][0]          
Total params: 1,890,510
Trainable params: 1,888,366
Non-trainable params: 2,144

However, in the multi-GPU case I can see that all the intermediate layers are packed under the model:

Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            (None, 512, 512, 3)  0                                            
lambda (Lambda)                 (None, 512, 512, 3)  0           input_1[0][0]                    
lambda_1 (Lambda)               (None, 512, 512, 3)  0           input_1[0][0]                    
model (Model)                   (None, None, 33)     1890510     lambda[0][0]                     
mbox (Concatenate)              (None, None, 33)     0           model[1][0]                      
Total params: 1,890,510
Trainable params: 1,888,366
Non-trainable params: 2,144

The training runs ok, but I cannot load previously pre-trained weights:

model.load_weights(args.weights, by_name=True)

because of the error:

ValueError: Layer #3 (named "model") expects 150 weight(s), but the saved weights have 68 element(s).

Surely, pre-trained model has only weights for the backbone, not for the rest of the object detection model.

Can anybody help me in understanding:

  • Why all the intermediate layers are packed into the Lambda layer?
  • Why this does not happen for a classification model
  • How can I either overcome the "model packing" or load the pre-trained weights for this kind of model?

NB: I am using the tf.Keras, which is a part of Tensorflow now.

Hi, I also have the same issue with a custom model with keras multi gpu model. Did you find a convenient way to get over this?Jayant Agrawal
Did you try the answer below?Dmytro Prylipko
I did not. The answer below is not very clear for me. Does it work for you? If yes, could you please clarify and explain? Thanks !Jayant Agrawal
First, build the model: model, predictor_sizes, input_encoder = build_model(...). Then, load the weights: model.load_weights(...). Only after that, make it distributed: model = multi_gpu_model(model, gpus=args.num_gpus).Dmytro Prylipko
So, I found a solution: the best way to do this is not to save the parallel model but to save the original model (used to create the parallel model). This is described here: textpert.ai/aime-blog/…Jayant Agrawal

You can load weights right after building it, before converting into the multi-gpu counterpart. Alternatively, you could have two objects for the single-gpu and the multi-gpu versions, and use the first one to load the weights, and the second one to train.


While compling your multi GPU model, try return the resulted model to a new var, like 'model_multiGPU', then after Training load weights using original model you fed in multi_gpu_model function, this will solve the problem.