Gradient descent for linear regression (one variable) in octave

Question

I'm trying to implement the linear regression with a single variable for linear regression (exercise 1 from standford's course on coursera about machine learning).

My understanding is that this is the math :

Now, my code implementation would be like this:

for iter = 1:num_iters

  temp1 = theta(1) - alpha * sum(X * theta - y) / m;
  temp2 = theta(2) - alpha * sum( (X * theta - y) .* X(2) ) / m;

  theta(1) = temp1;
  theta(2) = temp2;

where

m is the number of rows in X and y
alpha is the learning rate
theta is a 2X1 vector
X is a mX2 matrix formed by two mX1 vectors (one of ones, and one for the actual variable)
X * theta - y is a mX1 vector containing the difference between each the Ith hypotesys and the Ith output/y, and the sum is just that (the sum of each element of the vector, basically the summation).

I tried doing this manually with a small example (m = 4), and I think my code is right... but it obviously isn't, or I won't be writing here. When I run the algorithm, I get a different theta in return depending of the initial theta I pass to the function, and if I plot the cost function it obviously isn't right for certain values of theta (not all):

That probably means I don't really understand the math (and that would explain why everyone else on stackoverflow is using 'transpose' and I don't), the problem being I don't know which is the part I'm having trouble with.

I'd really appreciate some insights, but I'd like to complete the exercise on my own. Basically I'm looking for help, but not for the complete solution

EDIT: Apparently it was not a logical error, but a semantic error. When assigning temp2, I should have wrote (X * theta - y) .* X(:,2) instead of (X * theta - y) .* X(2); Basically, I was not selecting the second column of X (which is a mX2 matrix), but a scalar (due to octave's syntax).

qbzenker qbzenker · Accepted Answer · 2017-04-14T00:38:56

I just looked at the course briefly and it looks like you are mostly on the right track but here are some helpful hints:

m is the size of the training set (you can think of it as the # of rows)
θ0 a constant that will be changing simultaneously with θ1 and x,y are values of the given training set (data). (xi and yi just mean individual rows in your training data, eg x1, y1 would represent the first row)
hθ(x)=θ0+θ1*x not X*theta, as you have in your code, rather it is the linear equation you are fitting

Here is a starting place that you can work from:

for iter = 1:num_iters

  theta(1) = theta(1) - alpha * sum( (theta(1)+theta(2).*X) - y) / m;
  theta(2) = theta(2) - alpha * sum( ((theta(1)+theta(2).*X) - y) .* X ) / m;

Gradient descent for linear regression (one variable) in octave

3 Answers