Tensorflow tf.nn.conv2d clarification

Question

In reading through the Tensorflow tutorial and API documentation, I do not understand how they defined the shape of the convolution input and filter arguments. The method is: tf.nn.conv2d(input, filter, strides, padding, use_cudnn_on_gpu=None, data_format=None, name=None), where the input is shape: [batch, in_height, in_width, in_channels] and the filter is shape: [filter_height, filter_width, in_channels, out_channels]. If anyone could shed light on how to properly define the "in_channel" and "out_channel" sizes, that would be very helpful.

Ali Ali · Accepted Answer · 2017-04-17T00:49:29

in_channels refers to the depth of the inputs to the constitutional layer. For example, if you are feed the layer with raw RGB images, then the depth will be 3, corresponding to the Red, Green, and Blue channels. This means that the kernels actually are 3D rather than 2D. The out_channels refer to the depth of output. Following picture from here illustrates an example with input depth of 3 and output depth of 5:

properly define is something done based on experiments. That is a network design issue. You may read about some of the famous architectures like AlexNet and VGG-16 to see how network architectures are designed in practice.

Tensorflow tf.nn.conv2d clarification

1 Answers