0
votes

After my previous attempt, I managed to train a neural network to express the sine function. I used the ai4r Ruby gem:

require 'ai4r'
srand 1
net = Ai4r::NeuralNetwork::Backpropagation.new([1, 60, 1])
net.learning_rate = 0.01
#net.propagation_function = lambda { |x| 1.0 / ( 1.0 + Math::exp( -x ) ) }

def normalise(x, xmin, xmax, ymin, ymax)
  xrange = xmax - xmin
  yrange = ymax - ymin
  return ymin + (x - xmin) * (yrange.to_f / xrange)
end

training_data = Array.new
test = Array.new
i2 = 0.0
320.times do |i|
  i2 += 0.1
  hash = Hash.new
  output = Math.sin(i2.to_f)
  input = i2.to_f
  hash.store(:input,[normalise(input,0.0,32.0,0.0,1.0)])
  hash.store(:expected_result,[normalise(output,-1.0,1.0,0.0,1.0)])
  training_data.push(hash)
  test.push([normalise(output,-1.0,1.0,0.0,1.0)])
end
puts "#{test}"
puts "#{training_data}"

time = Time.now
999999.times do |i|
  error = 0.0
  training_data.each do |d|
    error+=net.train(d[:input], d[:expected_result])
  end
  if error < 0.26
    break
  end
  print "Times: #{i}, error: #{error} \r"
end
time2 = Time.now
puts "#{time2}-#{time} = #{time2-time} Sekunden gebraucht."

serialized = Marshal.dump(net)
File.open("net.saved", "w+") { |file| file.write(serialized) }

Everything worked out fine. The network was trained in 4703.664857 seconds.

sin function - neural network output

The network will be trained much faster when I normalise the input/output to a number between 0 and 1. ai4r uses a sigmoid function, so it's clear that it does not output negative values. But why do I have to normalise the input values? Does this kind of neural network only accept input values < 1?

In the sine example, is it possible to input any number as in:

Input: -10.0 -> Output: 0.5440211108893699
Input: 87654.322 -> Output: -0.6782453567239783
Input: -9878.923 -> Output: -0.9829544956991526

or do I have to define the range?

1

1 Answers

1
votes

In your structure you have 60 hidden nodes after a single input. This means that each hidden node has only 1 learned weight for a total of 60 values learned. The connection from the hidden layer to the single output node likewise has 60 weights, or learned values. This gives a total of 120 possible learnable dimensions.

Image what each node in the hidden layer is capable of learning: there is a single scaling factor, then a non-linearity. Let's assume that your weights end up looking like:

[1e-10, 1e-9, 1e-8, ..., .1]

with each entry being the weight of a node in the hidden layer. Now if you pass in the number 1 to your network your hidden layer will output something to this effect:

[0, 0, 0, 0, ..., .1, .25, .5, .75, 1] (roughly speaking, not actually calculated)

Likewise if you give it something large, like: 1e10 then the first layer would give:

[0, .25, .5, .75, 1, 1, 1, ..., 1].

The weights of your hidden layer are going to learn to separate in this fashion to be able to handle a large range of inputs by scaling them to a smaller range. The more hidden nodes you have (in that first layer), the less far each node has to separate. In my example they are spaced out by a factor of ten. If you had 1000's, they would be spaced out by a factor of maybe 2.

By normalizing the input range to be between [0,1], you are restricting how far those hidden nodes need to separate before they can start giving meaningful information to the final layer. This allows for faster training (assuming your stopping condition is based on change in loss).

So to directly answer your questions: No, you do not need to normalize, but it certainly helps speed up training by reducing the variability and size of the input space.