Keras with Tensorflow: predict_generator throws ResourceExhaustedError

Question

Specs

Docker container run on a machine with 16 GB Ram, 1x GTX1070 with 8GB, Ubuntu 16.04.3 LTS. Keras is set to use the GPU.

What I want to do

I want to calculate the convolution output for a set of 79726 images 245x245 (RGB) so I can then get predictions through a secondary model that is already trained. I am using the VGG16 model that comes with Keras.applications.

Code

model = VGG16(include_top=False)
tst_b_s = 200
test_batches = ImageDataGenerator().flow_from_directory(
    directory='test/',
    target_size=(245,245),
    batch_size=tst_b_s,
    shuffle=False,
  )
test_feats = model.predict_generator(test_batches, steps=test_batches.samples/tst_b_s, verbose=1)

Problem

The predict generator runs for a while, then it throws

ResourceExhaustedError: OOM when allocating tensor with shape[200,64,245,245] [[Node: block1_conv2/convolution = Conv2D[T=DT_FLOAT, data_format="NHWC", padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](block1_conv1/Relu, block1_conv2/kernel/read)]] [[Node: block5_pool/MaxPool/_159 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_127_block5_pool/MaxPool", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

Update using smaller batches (10)

The predicting process still halts, but with an internal error this time:

InternalError: Dst tensor is not initialized. [[Node: block5_pool/MaxPool/_159 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_127_block5_pool/MaxPool", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

There are no other processes using the GPU.

Thank you

It seems that you have set the batch size to be 200. It's a really huge value. Could you try to use smaller one? — Marcin Możejko
are there any other processes running on the same GPU ? What does nvidia-smi say about the allocated memory e.g. when your application is not running ? Do you see any processes in the output of nvidia-smi which you recognize ? — Andre Holzner

Alexandre Passos Alexandre Passos · Accepted Answer · 2018-01-29T18:02:18

You are using too much GPU memory. Try using a smaller batch size or making sure no other processes are running on the same GPU.

Keras with Tensorflow: predict_generator throws ResourceExhaustedError

3 Answers