0
votes

i'm trying to train a simple MLP with a own dataset in Python with Keras. This dataset includes normalized images in a size of 1024 x 1204, i need this resolution for therefore i can't decrease the size of the images. I use a Tesla V100 with 16GB for the training.

My aim is first of all, that something work, before i can tune this model (make a cnn etc.), but actually it does not, because of:

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[1048576,4096] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc

This error occurs at the first layer, so before the training realy begins.

I trained alreay a MLP in Julia with Flux without memory problems.

Everything i tried:

MLP in julia (flux)

m = Chain(
  Dense(1024*1024, 1024, relu),
  Dense(1024, 256, relu),
  Dense(256, 2),
  softmax) |> gpu

MLP in python (keras)

model = Sequential()
model.add(Dense(4*1024, input_shape=(1024*1024,)))
model.add(Dense(1024, activation='relu'))
model.add(Dense(256, activation='relu'))
model.add(Dense(num_classes, activation='softmax'))
1
1024 * 1024 (inputs) * 4096 (nodes in the next layer) * 4 bytes per float32 = 16GB before even getting to the second layer, and without even thinking about overhead.hobbs
Your GPU can't support the network you're trying to create. You need to rethink why that architecture is required for your application.rayryeng
I think you discovered why we don't use MLPs for image classification, CNNs are much more memory efficient.Dr. Snoopy
Thank you guys, the solution was to use a CNN with strides and pooling.bilal32

1 Answers

0
votes

There are three main things playing in this situation: the size of your inputs, the size of your network (number of parameters), and the GPU ram. Sadly those are considered large inputs, so your first bet is going to be turning down batch size. take it to 1, and if you still get this, either reduce your network or (more effectively) reduce your input sizes. Yes, it means information is lost from the signal prior to processing, but you'd be amazed what a network can infer from smaller resolution images. I understand not wanting to give up that information, though, so play with batch size first. for example, 762x762x3 inputs can run on Xception on 16gb with batch size 6, but not batch size 8. as for network architecture, those dense nets output sizes are huge. probably unnecessarily so. yeah so maybe crank that down a bit. and, like Matias said, reduction through convolution highly recommend.