1
votes

I have a simple CNN model that I train on ImageNet. I employ keras.utils.multi_gpu_model for multi-GPU training. It works fine, but I am getting problems when trying to train an SSD model based on the same backbone network. It has custom loss and several custom layers on the top of the backbone:

model, predictor_sizes, input_encoder = build_model(input_shape=(args.img_height, args.img_width, 3),                                                                                                                                   
                                                    n_classes=num_classes, mode='training')                                                                                                                                             

optimizer = Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)                                                                                                                                                          
loss = SSDMultiBoxLoss(neg_pos_ratio=3, alpha=1.0)                                                                                                                                                                                      

if args.num_gpus > 1:                                                                                                                                                                                                                   
    model = multi_gpu_model(model, gpus=args.num_gpus)                                                                                                                                                                                  
model.compile(optimizer=optimizer, loss=loss.compute_loss)                                                                                                                                                                              
model.summary()

In case of num_gpus==1 I have the following summary:

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_1 (InputLayer)            (None, 512, 512, 3)  0                                            
__________________________________________________________________________________________________
conv1_pad (Lambda)              (None, 516, 516, 3)  0           input_1[0][0]                    
__________________________________________________________________________________________________
conv1 (Conv2D)                  (None, 256, 256, 16) 1216        conv1_pad[0][0]                  
__________________________________________________________________________________________________
conv1_bn (BatchNormalization)   (None, 256, 256, 16) 64          conv1[0][0]                      
__________________________________________________________________________________________________
conv1_relu (Activation)         (None, 256, 256, 16) 0           conv1_bn[0][0]                   
__________________________________________________________________________________________________

....
                                                                 det_ctx6_2_mbox_loc_reshape[0][0]
__________________________________________________________________________________________________
mbox_priorbox (Concatenate)     (None, None, 8)      0           det_ctx1_2_mbox_priorbox_reshape[
                                                                 det_ctx2_2_mbox_priorbox_reshape[
                                                                 det_ctx3_2_mbox_priorbox_reshape[
                                                                 det_ctx4_2_mbox_priorbox_reshape[
                                                                 det_ctx5_2_mbox_priorbox_reshape[
                                                                 det_ctx6_2_mbox_priorbox_reshape[
__________________________________________________________________________________________________
mbox (Concatenate)              (None, None, 33)     0           mbox_conf_softmax[0][0]          
                                                                 mbox_loc[0][0]                   
                                                                 mbox_priorbox[0][0]              
==================================================================================================
Total params: 1,890,510
Trainable params: 1,888,366
Non-trainable params: 2,144

However, in the multi-GPU case I can see that all the intermediate layers are packed under the model:

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_1 (InputLayer)            (None, 512, 512, 3)  0                                            
__________________________________________________________________________________________________
lambda (Lambda)                 (None, 512, 512, 3)  0           input_1[0][0]                    
__________________________________________________________________________________________________
lambda_1 (Lambda)               (None, 512, 512, 3)  0           input_1[0][0]                    
__________________________________________________________________________________________________
model (Model)                   (None, None, 33)     1890510     lambda[0][0]                     
                                                                 lambda_1[0][0]                   
__________________________________________________________________________________________________
mbox (Concatenate)              (None, None, 33)     0           model[1][0]                      
                                                                 model[2][0]                      
==================================================================================================
Total params: 1,890,510
Trainable params: 1,888,366
Non-trainable params: 2,144

The training runs ok, but I cannot load previously pre-trained weights:

model.load_weights(args.weights, by_name=True)

because of the error:

ValueError: Layer #3 (named "model") expects 150 weight(s), but the saved weights have 68 element(s).

Surely, pre-trained model has only weights for the backbone, not for the rest of the object detection model.

Can anybody help me in understanding:

  • Why all the intermediate layers are packed into the Lambda layer?
  • Why this does not happen for a classification model
  • How can I either overcome the "model packing" or load the pre-trained weights for this kind of model?

NB: I am using the tf.Keras, which is a part of Tensorflow now.

2
Hi, I also have the same issue with a custom model with keras multi gpu model. Did you find a convenient way to get over this?Jayant Agrawal
Did you try the answer below?Dmytro Prylipko
I did not. The answer below is not very clear for me. Does it work for you? If yes, could you please clarify and explain? Thanks !Jayant Agrawal
First, build the model: model, predictor_sizes, input_encoder = build_model(...). Then, load the weights: model.load_weights(...). Only after that, make it distributed: model = multi_gpu_model(model, gpus=args.num_gpus).Dmytro Prylipko
So, I found a solution: the best way to do this is not to save the parallel model but to save the original model (used to create the parallel model). This is described here: textpert.ai/aime-blog/…Jayant Agrawal

2 Answers

0
votes

You can load weights right after building it, before converting into the multi-gpu counterpart. Alternatively, you could have two objects for the single-gpu and the multi-gpu versions, and use the first one to load the weights, and the second one to train.

0
votes

While compling your multi GPU model, try return the resulted model to a new var, like 'model_multiGPU', then after Training load weights using original model you fed in multi_gpu_model function, this will solve the problem.