In reading through the Tensorflow tutorial and API documentation, I do not understand how they defined the shape of the convolution input and filter arguments. The method is: tf.nn.conv2d(input, filter, strides, padding, use_cudnn_on_gpu=None, data_format=None, name=None)
, where the input is shape: [batch, in_height, in_width, in_channels]
and the filter is shape: [filter_height, filter_width, in_channels, out_channels]
. If anyone could shed light on how to properly define the "in_channel" and "out_channel" sizes, that would be very helpful.
2
votes
1 Answers
1
votes
in_channels
refers to the depth of the inputs to the constitutional layer. For example, if you are feed the layer with raw RGB images, then the depth will be 3, corresponding to the Red, Green, and Blue channels. This means that the kernels actually are 3D rather than 2D. The out_channels
refer to the depth of output. Following picture from here illustrates an example with input depth of 3 and output depth of 5:
properly define
is something done based on experiments. That is a network design issue. You may read about some of the famous architectures like AlexNet and VGG-16 to see how network architectures are designed in practice.