2
votes

In reading through the Tensorflow tutorial and API documentation, I do not understand how they defined the shape of the convolution input and filter arguments. The method is: tf.nn.conv2d(input, filter, strides, padding, use_cudnn_on_gpu=None, data_format=None, name=None), where the input is shape: [batch, in_height, in_width, in_channels] and the filter is shape: [filter_height, filter_width, in_channels, out_channels]. If anyone could shed light on how to properly define the "in_channel" and "out_channel" sizes, that would be very helpful.

1

1 Answers

1
votes

in_channels refers to the depth of the inputs to the constitutional layer. For example, if you are feed the layer with raw RGB images, then the depth will be 3, corresponding to the Red, Green, and Blue channels. This means that the kernels actually are 3D rather than 2D. The out_channels refer to the depth of output. Following picture from here illustrates an example with input depth of 3 and output depth of 5:

enter image description here

properly define is something done based on experiments. That is a network design issue. You may read about some of the famous architectures like AlexNet and VGG-16 to see how network architectures are designed in practice.