3
votes

I'm trying to figure out gradient descent with Octave. With each iteration, my thetas get exponentially larger. I'm not sure what the issue is as I'm copying another function directly.

Here are my matrices:

X = 1 98
    1 94
    1 93
    1 88
    1 84
    1 82
    1 79

y = 97
    94
    94
    78
    85
    85
    76

theta = 1
        1

I'm using this formula:

theta = theta - 0.001 * (1 / 7) * (X' * (X * theta - y))

I figured out what the optimal thetas are using the normal equation, but after only a few iterations my thetas are in the several thousands. Any idea what's wrong?

2

2 Answers

2
votes

You seem to be using gradient descent for linear regression, where your learning rate is too high, as mentioned in the previous answers too, this post is just to add some visualization and explain exactly what is happening in your case.

As shown in the below figure, the learning rate is high enough to converge to the global minimum in the convex cost surface and theta values oscillate and miss the minimum point, for the steps being too large (as shown in the RHS figure). If you decrease your learning rate (as in LHS) the convergence rate will be lower, but eventually you will reach the global minimum.

You need to find an alpha (learning rate) that is just right, so that the convergence rate is not too slow or too high (that will depend upon data, scaling the features will help).

enter image description here

1
votes

If the values are blowing up, then your step must be too big. Essentially, you are overshooting every time. If your step is too big, you will see a sequence of estimates like [100,-100,1000,-1000, ...]... the estimates will oscillate between consecutively large positive and negative numbers. The simplest fix is to change your step size constant from

0.001 * (1 / 7)

to something like

1e-6

or perhaps even smaller.