I'm trying to implement the linear regression with a single variable for linear regression (exercise 1 from standford's course on coursera about machine learning).
My understanding is that this is the math :
Now, my code implementation would be like this:
for iter = 1:num_iters
temp1 = theta(1) - alpha * sum(X * theta - y) / m;
temp2 = theta(2) - alpha * sum( (X * theta - y) .* X(2) ) / m;
theta(1) = temp1;
theta(2) = temp2;
where
- m is the number of rows in X and y
- alpha is the learning rate
- theta is a 2X1 vector
- X is a mX2 matrix formed by two mX1 vectors (one of ones, and one for the actual variable)
- X * theta - y is a mX1 vector containing the difference between each the Ith hypotesys and the Ith output/y, and the sum is just that (the sum of each element of the vector, basically the summation).
I tried doing this manually with a small example (m = 4), and I think my code is right... but it obviously isn't, or I won't be writing here. When I run the algorithm, I get a different theta in return depending of the initial theta I pass to the function, and if I plot the cost function it obviously isn't right for certain values of theta (not all):
That probably means I don't really understand the math (and that would explain why everyone else on stackoverflow is using 'transpose' and I don't), the problem being I don't know which is the part I'm having trouble with.
I'd really appreciate some insights, but I'd like to complete the exercise on my own. Basically I'm looking for help, but not for the complete solution
EDIT: Apparently it was not a logical error, but a semantic error. When assigning temp2, I should have wrote (X * theta - y) .* X(:,2)
instead of (X * theta - y) .* X(2)
; Basically, I was not selecting the second column of X (which is a mX2 matrix), but a scalar (due to octave's syntax).