I am trying to mimic the gradient descent algorithm for linear regression from Andrew NG's Machine learning course to Python, but for some reason my implementation is not working correctly.
Here's my implementation in Octave, it works correctly:
function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
J_history = zeros(num_iters, 1);
for iter = 1:num_iters
prediction = X*theta;
margin_error = prediction - y;
gradient = 1/m * (alpha * (X' * margin_error));
theta = theta - gradient;
J_history(iter) = computeCost(X, y, theta);
end
end
However, when I translate this to Python for some reason it is not giving me accurate results. The cost seems to be going up rather than descending.
Here's my implementation in Python:
def gradientDescent(x, y, theta, alpha, iters):
m = len(y)
J_history = np.matrix(np.zeros((iters,1)))
for i in range(iters):
prediction = x*theta.T
margin_error = prediction - y
gradient = 1/m * (alpha * (x.T * margin_error))
theta = theta - gradient
J_history[i] = computeCost(x,y,theta)
return theta,J_history
My code is compiling and there isn't anything wrong. Please note this is theta:
theta = np.matrix(np.array([0,0]))
Alpha and iters is set to this:
alpha = 0.01
iters = 1000
When I run it, opt_theta, cost = gradientDescent(x, y, theta, alpha, iters)
and print out opt_theta, I get this:
matrix([[ 2.36890383e+16, -1.40798902e+16],
[ 2.47503758e+17, -2.36890383e+16]])
when I should get this:
matrix([[-3.24140214, 1.1272942 ]])
What am I doing wrong?
Edit:
Cost function
def computeCost(x, y, theta):
# Get length of data set
m = len(y)
# We get theta transpose because we are working with a numpy array [0,0] for example
prediction = x * theta.T
J = 1/(2*m) * np.sum(np.power((prediction - y), 2))
return J
.T
as "prime". correct? If so then why in the Octane code are you not using theta prime for prediction but in the python code you are using theta prime? I don't know if this will solve your problem but good luck. – gabeio