0
votes

I'm encountered the problem, that I can not successfully split my training batches to more than one GPU. If multi_gpu_model from tensorflow.keras.utils is used, tensorflow allocates the full memory on all available (for example 2) gpus, but only the first one (gpu[0]) is utilized to 100% if nvidia-smi is watched.

I'm using tensorflow 1.12 right now.

Test on single device

model = getSimpleCNN(... some parameters)

model .compile()
model .fit()

As expected, data is loaded by cpu and the model runs on gpu[0] with 97% - 100% gpu utilization: enter image description here

Create a multi_gpu model

As described in the tensorflow api for multi_gpu_model here, the device scope for model definition is not changed.

from tensorflow.keras.utils import multi_gpu_model

model = getSimpleCNN(... some parameters)
parallel_model = multi_gpu_model(model, gpus=2, cpu_merge=False)  # weights merge on GPU (recommended for NV-link)

parallel_model.compile()
parallel_model.fit()

As seen in the timeline, cpu now not only loads the data, but is doing some other calculations. Notice: the second gpu is nearly doing nothing: enter image description here

The question

The effect even worsens as soon as four gpus are used. Utilization of the first one goes up to 100% but for the rest there are only short peeks.

Is there any solution to fix this? How to properly train on multiple gpus?

Is there any difference between tensorflow.keras.utils and keras.utils which causes the unexpected behavior?

1

1 Answers

0
votes

I just ran into the same issue. In my case, the problem came from the use of a build_model(... parameters) function that returned the model. Be careful with your getSimpleCNN() function, as I don't know what is in it my best advice is to build the model sequentially in your code without using this function.