How can a neural network learn a continuous rather than a discrete function?

Question

what I'm trying to do is for a neural network to 'learn' the function f(x) = x^2. I'm basing this of the code here this source. The neural network is handcoded in c# if that helps (using doubles).

The idea is that I feed the network pairs of reals, (e.g (1,1), (2, 4), (3,9)) and then have the network then output the correct square when given an unseen real. The input is given by 1 input neuron with the value of the real, and the output by the output of the out layer (also 1 neuron). There are 4 neurons in the hidden layer.

My problem is that the output of the output neuron is between 0 and 1 (I'm using the sigmoid function). I learnt neural networks from this source, where they were outputing discrete values (either the handwriting image represents 0, 1 .. or 9). The way I got round this was by using a function tan((pi * (2x - 1))/2 and its inverse. This maps (0,1) to the reals. I then applied the inverse of this to the training set. So when I feed it data, I give it x, and the inverse of the mapping function applied to x^2.

This seems to have real problems with numbers at the upper end of (0,1) (e.g 0.999999996) mapping to huge real numbers (I don't think doubles are precise enough). Is this the standard way of doing things, or is there a better way? Another idea was to use lots of output and input neurons and give them a binary vector (e.g 4 input neurons, (0, 0, 0, 1) is an input of 1). A further idea was to use decimals which are more precise than doubles.

Is this task even a good use of neural networks or is it something that isn't really proper?

Doubles are more precise than decimal. Using tan function creates new issues because it rapidly approaches infinity as you found out. Normally with tan a limit is set so when the number goes above/below a certain value the results are forced to 0 or 1. — jdweng

Marcin Możejko Marcin Możejko · Accepted Answer · 2016-07-06T10:43:39

Yes, the task you provided is a relatively easy and known example.

What you have to do is to use a linear activation instead of a sigmoid activation as an activation in a final layer. In this case you simply take a linear (affine) combination of an outputs from a hidden layer units. You have to also change your loss function to e.g. MSE which is designed to deal with a real valued functions instead a (0,1) interval only.

UPDATE: let's assume that y1, y2, y3, y4 are activations of hidden layer nodes. Then an affine activation is of a form:

w0 + w1 * y1 + w2 * y2 + w3 * y3 + w4 * y4

So this is in fact replacing tanh or sigmoid by the identity function.

UPDATE 2: Yes - the range of a linear activation is a set of all real numbers.

How can a neural network learn a continuous rather than a discrete function?

1 Answers