I want to make a CNN model in pytorch which can be fed images of different sizes. I am trying to use 2d convolution layer, which takes 4D input shape (pytorch's Conv2d expects its 2D inputs to actually have 4 dimensions). However, I'm not sure how to set up the input layer that can adjust all the variable sized images into fixed number of feature maps to pass over to remaining layers. For example, the shape of the input for colored images is [4, 3, 32, 32], which corresponds to batch size, number of channel(RGB), width, and height. If images are grayscale, then it will have [4, 1, 32, 32], which will produce an error when the shape is not what the layer expected.
Error message is "RuntimeError: Given groups=1, weight of size [6, 3, 5, 5], expected input[4, 1, 32, 32] to have 3 channels, but got 1 channels instead"
The architecture of my current CNN is like below.
def __init__(self, num_out, kernel_size, num_input_filters):
super().__init__()
self.num_input_filters = num_input_filters
self.num_out = num_out
self.kernel_size = kernel_size
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
def forward(self, inp):
inp = self.pool(F.relu(self.conv1(inp)))
return inp
I have referenced similar questions and Fully convolutional networks (FCN) have no limitations on the input size at all, which could be the solution. And pytorch provides ConvTranspose2d() for FCN, but its parameters still seem to require fixed input size. Are there any methods that can solve this problem?