2
votes

When creating a convolutional neural network (CNN) (e.g. as described in https://cs231n.github.io/convolutional-networks/) the input layer is connected with one or several filters, each representing a feature map. Here, each neuron in a filter layer is connected with just a few neurons of the input layer. In the most simple case each of my n filters has the same dimensionality and uses the same stride.

My (tight-knitted) questions are:

  1. How is ensured that the filters learn different features, although they are trained with the same patches?
  2. "Depends" the learned feature of a filter on the randomly assigned values (for weights and biases) when initiating the network?
1
1. it is not "ensured". empirically it happens 2. yes, bad initializing values might lead to a local minimum converganceuser2717954

1 Answers

2
votes

I'm not an expert, but I can speak a bit to your questions. To be honest, it sounds like you already have the right idea: it's specifically the initial randomization of weights/biases in the filters that fosters their tendencies to learn different features (although I believe randomness in the error backpropagated from higher layers of the network can play a role as well).

As @user2717954 indicated, there is no guarantee that the filters will learn unique features. However, each time the error of a training sample or batch is backpropagated to a given convolutional layer, the weights and biases of each filter is slightly modified to improve the overall accuracy of the network. Since the initial weights and biases are all different in each filter, it's possible (and likely given a suitable model) for most of the filters to eventually stabilize to values representing a robust set of unique features.

In addition to proper randomization of weights, this also demonstrates why it's crucial to use convolutional layers with an adequate number of filters. Without enough filters, the network is fundamentally limited such that there are important, useful patterns at the given layer of abstraction that simply can't be represented by the network.