2
votes

I am not sure how the convolutional neutral network in tensorflow calculates the dimension in this tutorial.

  1. The image has 28*28 pixels (x_image = tf.reshape(x, [-1,28,28,1]))
  2. The patch size is 5x5 (W_conv1 = weight_variable([5, 5, 1, 32])
  3. The first convolutional layer is done by: (h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1))
  4. The shape of the first layer is: h_pool1.get_shape() and TensorShape([Dimension(10), Dimension(14), Dimension(14), Dimension(32)])

Q1: Why the first dimension is 10?

Q2: Why the 5x5 patch size reduce the dimension to 14x14? If I have a 28x28 image, and I apply 5x5 patch to all pixels, I'd expect more than 14x14.

Q3: What's does -1 do in the code for x_image?

1

1 Answers

4
votes

The shapes are (batch_size, height, width, channel).

Q1. 10 is your batch size. I guess you have a line like this:

x = tf.placeholder(tf.float32, shape=[10, 784])

While in the tutorial the line is:

x = tf.placeholder(tf.float32, shape=[None, 784])

This way, you will have batchsize "Dimension(None)" instead of "Dimension(10)".

Q2. Layer1 include a convolution layer and a max-pooling layer. The convolution layer with "SAME" padding will output something with the same size. The size reduction comes from the 2x2 max-pooling with "SAME" padding, which outputs (h/2, w/2).

def conv2d(x, W):
   return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

def max_pool_2x2(x):
   return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
                         strides=[1, 2, 2, 1], padding='SAME')

h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)

Q3. tf.reshape() with a single dimension "-1" leaves the dimension automatically calculated by the program so that the total size remains the same.