def computeCost(X, y, theta):
inner = np.power(((X * theta.T) - y), 2)
return np.sum(inner) / (2 * len(X))
def gradientDescent(X, y, theta, alpha, iters):
temp = np.matrix(np.zeros(theta.shape))
params = int(theta.ravel().shape[1]) #flattens
cost = np.zeros(iters)
for i in range(iters):
err = (X * theta.T) - y
for j in range(params):
term = np.multiply(err, X[:,j])
temp[0, j] = theta[0, j] - ((alpha / len(X)) * np.sum(term))
theta = temp
cost[i] = computeCost(X, y, theta)
return theta, cost
Here is the code for linear regression cost function and gradient descent that I've found on a tutorial, but I am not quite sure how it works.
First I get how the computeCost
code works since it's just (1/2M) where M is number of data.
For gradientDescent
code, I just don't understand how it works in general. I know the formula for updating the theta is something like
theta = theta - (learningRate) * derivative of J(cost function)
. But I am not sure where alpha / len(X)) * np.sum(term
) this comes from on the line updating temp[0,j]
.
Please help me to understand!