1
votes

I think convolution layers should be fully connected (see this and this). That is, each feature map should be connected to all feature maps in the previous layer. However, when I looked at this CNN visualization, the second convolution layer is not fully connected to the first. Specifically, each feature map in the second layer is connected to 3~6 (all) feature maps in the first layer, and I don't see any pattern in it. The questions are

  1. Is it canonical/standard to fully connect convolution layers?
  2. What's the rational for the partial connections in the visualization?
  3. Am I missing something here?
1
That's not a very impressive CNN, to be honest. I tried to draw a "3" three times, and in all cases neither the first nor the second guess were correct. - MSalters
@MSalters I guess it's mainly for educational purposes rather than practical use. Saying that, I'm expecting to see something canonical/typical in it. - Lingxi

1 Answers

2
votes

Neural networks have the remarkable property that knowledge is not stored anywhere specifically, but in a distributed sense. If you take a working network, you can often cut out large parts and still get a network that works approximately the same.

A related effect is that the exact layout is not very critical. ReLu and Sigmoid (tanh) activation functions are mathematically very different, but both work quite well. Similarly, the exact number of nodes in a layer doesn't really matter.

Fundamentally, this relates to the fact that in training you optimize all weights to minimize your error function, or at least find a local minimum. As long as there are sufficient weights and those are sufficiently independent, you can optimize the error function.

There is another effect to take into account, though. With too many weights and not enough training data, you cannot optimize the network well. Regularization only helps so much. A key insight in CNN's is that they have less weights than a fully connected network, because nodes in a CNN are connected only to a small local neighborhood of nodes in the prior layer.

So, this particular CNN has even less connections than a CNN in which all feature maps are connected, and therefore less weights. That allows you to have more and/or bigger maps for a given amount of data. Is that the best solution? Perhaps - choosing the best layout is still a bit of a black art. But it's not a priori unreasonable.