What input shapes are required for a classifier that is using a pre-trained network (Pytorch)?

Question

I'm fairly new to deeplearning, python, and pytorch so please bear with me!

I'm trying to understand Transfer Learning in Pytorch using two different Pretrained Networks: Vgg11 and Densenet121. I've run data of shape (3 x 224 x 224) through the "features" part of the above networks, and the output shapes are as follows:

Vgg11 features output shape: 512 x 7 x 7

Densenet121 features output shape: 1024 x 7 x7

Now, I'm trying to make my own Classifier to use instead of the Pre-trained one. Upon checking both pre-trained classifiers, I see the Vgg11 classifier has in the first layer:

(0): Linear(in_features=25088, out_features=4096, bias=True)

While the Densenet121 has in the first layer:

(classifier): Linear(in_features=1024, out_features=1000, bias=True))

The Vgg one makes sense, since if you flatten the output of the "features" part, you get 512 x 7 x 7 = 25,088.

How does the Densenet one have only 1024 dimensions? If you flatten the output of its "features" part, you get 1024 x 7 x 7 = 50,176

Are there steps that I am missing for either of them? Are there ways to check the input and output shapes of each layer and find out exactly what's happening?

Thank you.

Vikas D Vikas D · Accepted Answer · 2020-03-12T06:15:29

As mentioned in Table 1 in the DenseNet paper, DenseNet-121 uses something called Global Average Pooling, which is an extreme way of pooling where a tensor of dimensions d x h x w is reduced to d x 1 x 1.

What input shapes are required for a classifier that is using a pre-trained network (Pytorch)?

1 Answers