Gaining intuition from gradient descent update rule

Question

Gradient descent update rule :

enter image description here

Using these values for this rule :

x = [10 20 30 40 50 60 70 80 90 100] y = [4 7 8 4 5 6 7 5 3 4]

After two iterations using a learning rate of 0.07 outputs a value theta of

-73.396
-5150.803

After three iterations theta is :

1.9763e+04
   1.3833e+06

So it appears theta gets larger after the second iteration which suggests the learning rate is too large.

So I set :

iterations = 300; alpha = 0.000007;

theta is now :

 0.0038504
 0.0713561

Should these theta values allow me to draw a straight line the data, if so how ? I've just begun trying to understand gradient descent so please point out any errors in my logic.

source :

x = [10
    20
    30
    40
    50
    60
    70
    80
    90
    100]
y = [4
    7
    8
    4
    5
    6
    7
    5
    3
    4]

m = length(y)

x = [ones(m , 1) , x]

theta = zeros(2, 1);        

iterations = 300;
alpha = 0.000007;

for iter = 1:iterations
     theta = theta - ((1/m) * ((x * theta) - y)' * x)' * alpha;
     theta
end

plot(x, y, 'o');
ylabel('Response Time')
xlabel('Time since 0')

Update :

So the product for each x value multiplied by theta plots a straight line :

plot(x(:,2), x*theta, '-')

enter image description here

Update 2 :

How does this relate to the linear regression model :

enter image description here

As the model also outputs a prediction value ?

@Roshan Mathews sorry! I'm thinking how to reform this question in relation to your comment, thanks. — blue-sky
Another link, 3 pages with a nice applet and Barzilai-Borwein method for adaptive stepsize: onmyphd.com/?p=gradient.descent — denis

small_data88 small_data88 · Accepted Answer · 2015-07-18T19:28:35

Yes, you should be able to draw a straight line. In regression, gradient descent is an algorithm used to minimize the cost(error) function of your linear regression model. You use the gradient as a track to travel to the minimum of your cost function and the learning rate determines how quickly you travel down the path. Go too fast and you might pass the global minimum up. When you reached the desired minimum, plug those values of theta into your model to obtain your estimated model. In the one dimensional case, this is a straight line.

Check out this article, which gives a nice introduction to gradient descent.

Gaining intuition from gradient descent update rule

1 Answers