15
votes

I am trying to model a fully convolutional neural network using the Keras library, Tensorflow backend.

The issue I face is that of feeding ]differently sized images in batches to model.fit() function. The training set consists of images of different sizes varying from 768x501 to 1024x760.

Not more than 5 images have the same dimensions, so grouping them into batches seems to be of no help.

Numpy allows storing the data in a single variable in list form. But the keras model.fit() function throws an error on receiving a list type training array.

I do not wish to resize and lose the data as I already have a very small dataset.

How do I go about training this network?

1
You can try and pad the smaller images so that they have the same size as the largest image. If that is not a valid solution for you, try reading this. Towards the end someone poses the same question as you.gionni
Why don't you want to resize the data? How many images do you have?finbarr
I have just about 1200 images for training. Some are of 1024x(~650), and some are 768x(~520). So what I have done is split them into two batches and trained the model using them. I did not want to resize because I wanted purely unaltered data to be used. But it seems like this is the only way to go, or padding.Blue
So, I am training a fully convolutional network. Image in image out type. So my output images are of the same size as my input. What I dont understand is that, I have trained using input shape as (None, None, 1). Using two batches of training images - 1024x680 and 760x520. But when I feed an input of size 1024x720, it predicts an output of the size 1024x680, which corresponds to one of my training batch sizes. How do I go about creating a network that can be fed an input image of any size to produce the desired output - an image of the same size as the input image.Blue
resizing or crop is the way to go.Fred Guth

1 Answers

2
votes

I think Spatial Pyramid Pooling (SPP) might be helpful. Checkout this paper.

We note that SPP has several remarkable properties for deep CNNs:

1) SPP is able to generate a fixed-length output regardless of the input size, while the sliding window pooling used in the previous deep networks cannot;

2) SPP uses multi-level spatial bins, while the sliding window pooling uses only a single window size. Multi-level pooling has been shown to be robust to object deformations;

3) SPP can pool features extracted at variable scales thanks to the flexibility of input scales. Through experiments we show that all these factors elevate the recognition accuracy of deep networks.


yhenon has implemented SPP for Keras on Github.