CNN for variable sized images in pytorch

Question

I want to make a CNN model in pytorch which can be fed images of different sizes. I am trying to use 2d convolution layer, which takes 4D input shape (pytorch's Conv2d expects its 2D inputs to actually have 4 dimensions). However, I'm not sure how to set up the input layer that can adjust all the variable sized images into fixed number of feature maps to pass over to remaining layers. For example, the shape of the input for colored images is [4, 3, 32, 32], which corresponds to batch size, number of channel(RGB), width, and height. If images are grayscale, then it will have [4, 1, 32, 32], which will produce an error when the shape is not what the layer expected.

Error message is "RuntimeError: Given groups=1, weight of size [6, 3, 5, 5], expected input[4, 1, 32, 32] to have 3 channels, but got 1 channels instead"

The architecture of my current CNN is like below.

def __init__(self, num_out, kernel_size, num_input_filters):
        super().__init__()
        self.num_input_filters = num_input_filters
        self.num_out = num_out
        self.kernel_size = kernel_size
        
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)


def forward(self, inp):
        inp = self.pool(F.relu(self.conv1(inp)))
        return inp

I have referenced similar questions and Fully convolutional networks (FCN) have no limitations on the input size at all, which could be the solution. And pytorch provides ConvTranspose2d() for FCN, but its parameters still seem to require fixed input size. Are there any methods that can solve this problem?

gingertsai gingertsai · Accepted Answer · 2021-02-03T09:41:41

You can just convert the grayscale images to RGB by duplicating the single channel to three.

As your example, shape [1, 32, 32] can be converted to [3, 32, 32] by the following codes:

np.concatenate((images,)*3)

If the shape is [32, 32, 1], try

np.concatenate((images,)*3, axis=-1)

If the shape is [32, 32], then you can try codes below to convert to [32, 32, 3]:

img_shape = tuple(np.ones(len(images.shape), dtype=int))
img_shape += (3,)
images = np.tile(np.expand_dims(images, axis=-1), img_shape)

CNN for variable sized images in pytorch

1 Answers