0
votes

I'm trying to understand the Gradient descent algorithm for linear regression.

The question is why we multiply by x(1) at the end of theta1 and don't do that at the end of theta0?

Thanks a lot!

enter image description here

3

3 Answers

0
votes

Hypothesis is theta0 + theta1*x. On differentiating with respect to theta0 and theta1 you get 1 and x respectively. So, you get x in update for theta1 but not for theta0. For more details, refer to this document, cs229-notes1.

0
votes

In short because of partial derivative & Application of the chain rule.

For Theta 0, when you take derivative of the loss function (MSE) with respect to Theta 0 (Or Beta 0 / Intercept ), your derivative is in the form shown the rightmost of eq1.

imagine...
Y = Mx + C
M = Theta 1
C = Theta 0

Loss Function = (Y - (Mx + C))^2

The derivative is in the form of f(x) * f'(x) if that makes sense. f'(x) in Theta 0 is 1 (watch the video to understand the derivate). So

2(Y - (Mx + C)) *  derivative of with respect to C of (Y - (Mx + C))
= 2(Y - (Mx + C)) [disregard the 2 in front]


For Theta 1, when you take derivative of the loss function (MSE) with respect to Theta 1 (Or Beta 1 / slope ), your derivative is in the form shown the rightmost of eq1. In this case f'(x) is x, because.....

2(Y - (Mx + C)) *  derivative of with respect to M of (Y - (Mx + C))
= 2(Y - (Mx + C)) * (1*x) [because the only term that is left is dx(Mx)]

Here is a video that can help https://www.youtube.com/watch?v=sDv4f4s2SB8

0
votes

The loss function for linear regression is given by

J = {(HthetaX(i))-y}^2

And we have gradient = Derivative of Loss.Therefore,

DJ/Dtheta = 2*(HthetaX(i))-y)*X(i). Now for theta0 X(i) ==1 ,hence

DJ/Dtheta for Theta0 = 2*(Htheta*X(i))-y)