5
votes

Consider a Convolutional Neural Network with the following architecture:

CNN Architecture

Here C_i refers to the i^th convolutional layer and P_i refers to the i^th mean pooling layer. Corresponding to each layer will be an output. Let delta^P_j refer to the error in the output of layer P_j (and same for enter image description here).

delta^P_2 can be calculated easily using normal backpropagation equations since it is fully connected to the softmax layer. delta^C_2 can be calculated simply by upsampling delta^P_2 appropriately (and multiplying by gradient of output of C_2) since we are using mean pooling.

How do we propagate error from the output of C_2 to the output of P_1? In other words, how do we find delta^P_1 from delta^C_2?

Standford's Deep Learning tutorial uses the following equation to do this:

UFLDL Equation

However I am facing the following problems in using this equation:

  1. My W_k^l has size (2x2) and delta_k^l has size (6x6), (I am using valid convolution, output of P_1 has size (13x13) and output of P_2 has size (6x6)). This inner matrix multiplication does not even makes sense in my case.

  2. Equation assumes that the number of channels in both layers is same. Again this is not true for me. Output of P_1 has 64 channels while output of C_2 has 96 channels.

What am I doing wrong here? Can anybody please explain how to propagate errors through a convolutional layer?

Simple MATLAB example will be highly appreciated.

2

2 Answers

2
votes

A good point to note here is that pooling layers do not do any learning themselves. The function of the pooling layer is to progressively reduce the spatial size of the representation to reduce the amount of parameters and computation in the network.

During forward propagation, a P by P pooling block is reduced to a single value i.e. value of the “winning unit”. To keep track of the “winning unit” its index noted during the forward pass and used for gradient routing during backpropagation.

During backpropagation, the gradients in the convolutional layers are calculated and the backward pass to the pooling layer then involves assigning the “winning unit” the gradient value from the convolutional layer as the index was noted prior during the forward pass.

Gradient routing is done in the following ways:

  • Max-pooling - the error is just assigned to where it comes from - the “winning unit” because other units in the previous layer’s pooling blocks did not contribute to it hence all the other assigned values of zero

  • Average pooling - the error is multiplied by 1 / (P by P) and assigned to the whole pooling block (all units get this same value).

Read a more comprehensive breakdown on the whole backpropagation procedure here

0
votes

I think the dimensions of your layers and weights are pretty different from what you think. If "output of P_1 has 64 channels while output of C_2 has 96 channels" and your convolution is 2x2, then W is not 2x2, it is 96x64x2x2 (a rank-4 tensor; the convention for the order of dimensions/indexes may vary, but you get the idea). The "inner matrix multiplication" is actually a tensor convolution. Going forward you would convolve W which is 96x64x2x2 with the input which is (I assume) 64x7x7 to produce an output which is 96x6x6 (this assumes "valid" convolution and no padding, that's why sliding 2x2 over 7x7 produces 6x6). Going backwards, you would convolve the transpose of W with output to produce something with the same dimensions as the input.

Mean pooling just makes things a bit more complicated; first try understanding this without pooling, or model pooling as a convolution with fixed (equal) weights.