1
votes

I'm implementing a neural network with backward propagation. The weights are initialized to (-0.5, 0.5). However, after the first time the inputs are sent forward and errors and propagated back, the weights are increased to be around 1000, sometimes even 2000. (between the input layer and hidden layer)

The topology of the network consists of 3 layers: 1 input layer, 1 hidden layer, and 1 output layer. The input layer has 95 nodes, hidden layer has 3 nodes and output layer has 2 nodes. The training data set has 40,000 entries, they are normalized with their z-scores.

After seeing such high numbers I doubted my implementation, but then again, with learning rate set to 1 on the first propagation, if each entry has around (output*error)=0.25, which is reasonable, then a weight change to be about 1000 seems plausible.

Anyway, are weights in a neural network suppose to be this high?

Gracias

1
What activation function are you using?XCS
@Cristy 1/(1+e^(-input))lildoodilydo
What optimization algorithm are you using? For most of them, including the basic stochastic gradient descent, a learning rate of 1 is much, much, much too large, and your model may never converge with such a learning rate. See e.g. medium.com/octavian-ai/… for some discussion.Peteris
1 is pretty high learning rate. i suggest that you try something smaller, say .1 and see what happens.Ray Tayek

1 Answers

0
votes

A value that high isn’t necessarily a bad thing. Weights can be very high, or very low. They can even be zero!

Let’s say that you have two classes: A & B

Inputs for A classes are typically always around 0.00001. Values for B classes are the same, but some input values are around 0.001.

The input to a node is w * x

A) 0.00001 * 1000 = 0.001
B) 0.001 * 1000 = 1

When you feed outputs like A into sigmoid (your activation function), you near enough get a zero result. The signal dies.

But for outputs like B going into a sigmoid function, you get a larger value out (unsure off hand, but probably around 1). So the signal is propagated forward.

The values of your weights depend on many things:

  • the data
  • the problem being solved
  • your activation function choices
  • the number of neurons in each layer
  • the number of layers
  • the value of other weights!