Gradient Descent for Linear Regression not finding optimal parameters

Question

I am trying to implement the gradient descent algorithm to fit a straight line to noisy data following the following image taken from Andrew Ng's course.

First, I am declaring the noisy straight line I want to fit:

xrange =(-10:0.1:10); % data lenght
ydata  = 2*(xrange)+5; % data with gradient 2, intercept 5
plot(xrange,ydata); grid on;
noise  = (2*randn(1,length(xrange))); % generating noise 
target = ydata + noise; % adding noise to data
figure; scatter(xrange,target); grid on; hold on; % plot a sctter

I then initialize both parameters, the objective function history as follows:

tita0 = 0 %intercept (randomised)
tita1 = 0 %gradient  (randomised)

% Initialize Objective Function History
J_history = zeros(num_iters, 1);

% Number of training examples
m = (length(xrange));

I proceed to write the gradient descent algorithm:

for iter = 1:num_iters

    h = tita0 + tita1.*xrange; % building the estimated 

    %c = (1/(2*length(xrange)))*sum((h-target).^2)

    temp0 = tita0 - alpha*((1/m)*sum((h-target)));
    temp1 = tita1 - alpha*((1/m)*sum((h-target))).*xrange;
    tita0 = temp0;
    tita1 = temp1;

    J_history(iter) = (1/(2*m))*sum((h-target).^2); % Calculating cost from data to estimate

end

Last but not least, the plots. I am using MATLAB's inbuilt polyfit function to test the accuracy of my fit.

% print theta to screen
fprintf('Theta found by gradient descent: %f %f\n',tita0,  tita1(end));
fprintf('Minimum of objective function is %f \n',J_history(num_iters));

%Plot the linear fit
hold on; % keep previous plot visibledesg
plot(xrange, tita0+xrange*tita1(end), '-'); title(sprintf('Cost is %g',J_history(num_iters))); % plotting line on scatter

% Validate with polyfit fnc
poly_theta = polyfit(xrange,ydata,1);
plot(xrange, poly_theta(1)*xrange+poly_theta(2), 'y--');
legend('Training data', 'Linear regression','Linear regression with polyfit')
hold off

Result:

AS can be seen my linear regression is not working well at all. It seems as though both parameters (y-intercept and gradient) are not converging to the optimal solution.

Any suggestions on what I may be doing wrong in my implementation would be appreciated. I can't seem to understand where my solution is diverging from the equations shown above. Thanks!

Can you plot the evolution of theta_0 and theta_1 as well as the cost? I am not a Matlab programmer but seeing the evolution of the thetas will give us an insight into the issue. — Spinor8

Spinor8 Spinor8 · Accepted Answer · 2019-04-07T16:00:17

Your implementation for theta_1 is incorrect. Andrew Ng's equation sums across the x's as well. What you have for theta_0 and theta_1 are

temp0 = tita0 - alpha*((1/m)*sum((h-target)));
temp1 = tita1 - alpha*((1/m)*sum((h-target))).*xrange;

Notice that sum((h-target)) appears in both formulas. You will need to multiply the x's first before summing it up. I am not a MatLab programmer so I can't fix your code.

The big picture of what you are doing in your incorrect implementation is that you are pushing the predicted values for the intercept and slope in the same direction because your change is always proportional to sum((h-target)). That's not the way that gradient descent works.

Gradient Descent for Linear Regression not finding optimal parameters

2 Answers