I have implemented a very simple linear regression with gradient descent algorithm in JavaScript, but after consulting multiple sources and trying several things, I cannot get it to converge.
The data is absolutely linear, it's just the numbers 0 to 30 as inputs with x*3 as their correct outputs to learn.
This is the logic behind the gradient descent:
train(input, output) {
const predictedOutput = this.predict(input);
const delta = output - predictedOutput;
this.m += this.learningRate * delta * input;
this.b += this.learningRate * delta;
}
predict(x) {
return x * this.m + this.b;
}
I took the formulas from different places, including:
- Exercises from Udacity's Deep Learning Foundations Nanodegree
- Andrew Ng's course on Gradient Descent for Linear Regression (also here)
- Stanford's CS229 Lecture Notes
- this other PDF slides I found from Carnegie Mellon
I have already tried:
- normalizing input and output values to the [-1, 1] range
- normalizing input and output values to the [0, 1] range
- normalizing input and output values to have mean = 0 and stddev = 1
- reducing the learning rate (1e-7 is as low as I went)
- having a linear data set with no bias at all (
y = x * 3
) - having a linear data set with non-zero bias (
y = x * 3 + 2
) - initializing the weights with random non-zero values between -1 and 1
Still, the weights (this.b
and this.m
) do not approach any of the data values, and they diverge into infinity.
I'm obviously doing something wrong, but I cannot figure out what it is.
Update: Here's a little bit more context that may help figure out what my problem is exactly:
I'm trying to model a simple approximation to a linear function, with online learning by a linear regression pseudo-neuron. With that, my parameters are:
- weights: [
this.m
,this.b
] - inputs: [
x
,1
] - activation function: identity function
z(x) = x
As such, my net will be expressed by y = this.m * x + this.b * 1
, simulating the data-driven function that I want to approximate (y = 3 * x
).
What I want is for my network to "learn" the parameters this.m = 3
and this.b = 0
, but it seems I get stuck at a local minima.
My error function is the mean-squared error:
error(allInputs, allOutputs) {
let error = 0;
for (let i = 0; i < allInputs.length; i++) {
const x = allInputs[i];
const y = allOutputs[i];
const predictedOutput = this.predict(x);
const delta = y - predictedOutput;
error += delta * delta;
}
return error / allInputs.length;
}
My logic for updating my weights will be (according to the sources I've checked so far) wi -= alpha * dError/dwi
For the sake of simplicity, I'll call my weights this.m
and this.b
, so we can relate it back to my JavaScript code. I'll also call y^
the predicted value.
From here:
error = y - y^
= y - this.m * x + this.b
dError/dm = -x
dError/db = 1
And so, applying that to the weight correction logic:
this.m += alpha * x
this.b -= alpha * 1
But this doesn't seem correct at all.
this.m += this.learningRate * delta * input;
does not look familiar: the input has nothing to do here. Your bias-handling also looks strange. As i'm not familiar with JS, i expect those expressions are vectorized ones? If not, start from scratch. – saschaw += learningRate * gradient * input
? It always appears as result of the derivative ofy = m.x + b
in respect tom
. (Either that, or I'm completely misunderstanding it.) – Alpha