0
votes

In training Multi-layer Neural networks using back-propagation, weights of all layer are updated in each iteration.

I am thinking if we randomly select any layer and update weights of that layer only in each iteration of back-propagation.

How is it going to impact training time? Does model performance (generalization capabilities of model) suffers from this type of training?

My intuition is that generalization capability will be same and training time will be reduced. Please correct if I am wrong.

1

1 Answers

1
votes

Your intution is wrong. What you are proposing is a block coordinated descent and while it makes sense to do something like this if the gradients are not correlated it does not make sense to do so in this context.

The problem in NNs for this is that you get the gradient of preceeding layers for free, while you calculate the gradient for any single layer, due to the chain rule. Therefore, you are just discarding this information for no good reason.