Convolution layer in CNN

Question

We know that Convolution layer in CNN uses filters and different filters will look for different information in the input image.

But let say in this SSD, we have prototxt file and it has specification for the convolution layer as

layer {
  name: "conv2_1"
  type: "Convolution"
  bottom: "pool1"
  top: "conv2_1"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0.0
  }
  convolution_param {
    num_output: 128
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
  }
}

All convolution layers in different networks like (GoogleNet, AlexNet, VGG etc) are more or less similar. Just look at that and how to understand, filters in this convolution layer try to extract which information of the input image?

EDIT: Let me clarify for my question. I see two convolutions layer from the prototxt file as follows. They are from SSD.

layer {
  name: "conv1_1"
  type: "Convolution"
  bottom: "data"
  top: "conv1_1"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0.0
  }
  convolution_param {
    num_output: 64
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
  }
}

layer {
  name: "conv2_1"
  type: "Convolution"
  bottom: "pool1"
  top: "conv2_1"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0.0
  }
  convolution_param {
    num_output: 128
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
  }
}

Then I print here of their outputs

Data

conv1_1 and conv2_1 images are here and here.

So my query is how these two conv layers produced different output. But no difference in prototxt file.

"Just look at that and how to understand, filters in this convolution layer try to extract which information of the input image?" Didn't get you? — Harsh Wardhan
I didn't get your question. Do you want to know what each filter on each layer is looking for? — FalconUA
In general, the first layers extract edge-like features ( which are appropriate for accurate localization) but as you go deeper into the network, filter mostly works on blob-shape feature which is appropriate for discriminating object with each other. — Hossein Kashiani
@HosseinKa Yes. That is what I meaned. How you can tell the first convolution is looking for edge and the following are looking for blob-shape? Those in the prototxt file, they all look same. How you know which convolution is looking for which information. — batuman

Hossein Kashiani Hossein Kashiani · Accepted Answer · 2018-01-16T11:20:59

The filters at earlier layers represent low-level features like edges (These features retain higher spatial resolution for precise localization with low-level visual information similar to the response map of Gabor filters). On the other hand, the filter at the mid-layer extract features like corners or blobs, which are more complex.

And as you go deeper you can not visualize and interpret these features, because filters in mid-level and high-level layers are not directly connected to the input image. For instance, when you get the output of the first layer you can actually visualize and interpret it as edges but when you go deeper and apply second convolution layer to these extracted edges (the output of the first layer), then you get something like edges of edges ( or sth like this) and capture more semantic information and less fine-grained spatial details. In the the prototxt file all convolutions and other types of operation can resemble each other. But they extract different kinds of features, because of having different order and weights.

Convolution layer in CNN

2 Answers