2
votes

I'm trying to implement the linear regression with a single variable for linear regression (exercise 1 from standford's course on coursera about machine learning).

My understanding is that this is the math :enter image description here

Now, my code implementation would be like this:

for iter = 1:num_iters

  temp1 = theta(1) - alpha * sum(X * theta - y) / m;
  temp2 = theta(2) - alpha * sum( (X * theta - y) .* X(2) ) / m;

  theta(1) = temp1;
  theta(2) = temp2;

where

  • m is the number of rows in X and y
  • alpha is the learning rate
  • theta is a 2X1 vector
  • X is a mX2 matrix formed by two mX1 vectors (one of ones, and one for the actual variable)
  • X * theta - y is a mX1 vector containing the difference between each the Ith hypotesys and the Ith output/y, and the sum is just that (the sum of each element of the vector, basically the summation).

I tried doing this manually with a small example (m = 4), and I think my code is right... but it obviously isn't, or I won't be writing here. When I run the algorithm, I get a different theta in return depending of the initial theta I pass to the function, and if I plot the cost function it obviously isn't right for certain values of theta (not all):

This looks about right

This doesn't

That probably means I don't really understand the math (and that would explain why everyone else on stackoverflow is using 'transpose' and I don't), the problem being I don't know which is the part I'm having trouble with.

I'd really appreciate some insights, but I'd like to complete the exercise on my own. Basically I'm looking for help, but not for the complete solution

EDIT: Apparently it was not a logical error, but a semantic error. When assigning temp2, I should have wrote (X * theta - y) .* X(:,2) instead of (X * theta - y) .* X(2); Basically, I was not selecting the second column of X (which is a mX2 matrix), but a scalar (due to octave's syntax).

3

3 Answers

3
votes

I just looked at the course briefly and it looks like you are mostly on the right track but here are some helpful hints:

  • m is the size of the training set (you can think of it as the # of rows)
  • θ0 a constant that will be changing simultaneously with θ1 and x,y are values of the given training set (data). (xi and yi just mean individual rows in your training data, eg x1, y1 would represent the first row)
  • hθ(x)=θ0+θ1*x not X*theta, as you have in your code, rather it is the linear equation you are fitting

Here is a starting place that you can work from:

for iter = 1:num_iters

  theta(1) = theta(1) - alpha * sum( (theta(1)+theta(2).*X) - y) / m;
  theta(2) = theta(2) - alpha * sum( ((theta(1)+theta(2).*X) - y) .* X ) / m;
1
votes

Please try this (Linear Regression with one variable):

m = length(y);
J_history = zeros(num_iters, 1);

for iter = 1:num_iters

    % Normal Equation
    % theta = pinv(X'*X)*X'*y;

    predictions = X * theta;

    delta = (1/m) * X' * (predictions - y);
    theta = theta - alpha * delta; 

    % Save the cost J in every iteration    
    J_history(iter) = computeCost(X, y, theta);

end
1
votes

According to the gradient descent algorithm you have to update the value of theta(1) and theta(2) simultaneously. You cannot update the value of theta(1) first, and then calculate the value of theta(2) using updated theta(1) value.

Check this code for better understanding:

m = length(y); % number of training examples
J_history = zeros(num_iters, 1);

for iter = 1:num_iters
    x = X(:,2);
    h = theta(1) + (theta(2)*x);

    theta_zero = theta(1) - alpha * (1/m) * sum(h-y);
    theta_one  = theta(2) - alpha * (1/m) * sum((h - y) .* x);

    theta = [theta_zero; theta_one];

    J_history(iter) = computeCost(X, y, theta);

end

Here I have updated the value of theta(1) and theta(2) simultaneously. The definition of Gradient Descent Algorithm is Repeat until convergence and update the value of theta simultaneously.