Why is my gradient descent algorithm not working correctly?

Question

I am trying to mimic the gradient descent algorithm for linear regression from Andrew NG's Machine learning course to Python, but for some reason my implementation is not working correctly.

Here's my implementation in Octave, it works correctly:

function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)

J_history = zeros(num_iters, 1);

for iter = 1:num_iters


    prediction = X*theta;
    margin_error = prediction - y;

    gradient = 1/m * (alpha * (X' * margin_error));
    theta = theta - gradient;

    J_history(iter) = computeCost(X, y, theta);

end

end

However, when I translate this to Python for some reason it is not giving me accurate results. The cost seems to be going up rather than descending.

Here's my implementation in Python:

def gradientDescent(x, y, theta, alpha, iters):
    m = len(y)

    J_history = np.matrix(np.zeros((iters,1)))

    for i in range(iters):
        prediction = x*theta.T
        margin_error = prediction - y

        gradient = 1/m * (alpha * (x.T * margin_error))
        theta = theta - gradient

        J_history[i] = computeCost(x,y,theta)

    return theta,J_history

My code is compiling and there isn't anything wrong. Please note this is theta:

theta = np.matrix(np.array([0,0]))

Alpha and iters is set to this:

alpha = 0.01
iters = 1000

When I run it, opt_theta, cost = gradientDescent(x, y, theta, alpha, iters) and print out opt_theta, I get this:

matrix([[  2.36890383e+16,  -1.40798902e+16],
        [  2.47503758e+17,  -2.36890383e+16]])

when I should get this:

matrix([[-3.24140214, 1.1272942 ]])

What am I doing wrong?

Edit:

Cost function

def computeCost(x, y, theta):
#   Get length of data set
    m = len(y)

    # We get theta transpose because we are working with a numpy array [0,0] for example
    prediction = x * theta.T

    J = 1/(2*m) * np.sum(np.power((prediction - y), 2))

    return J

I see you are using .T as "prime". correct? If so then why in the Octane code are you not using theta prime for prediction but in the python code you are using theta prime? I don't know if this will solve your problem but good luck. — gabeio
@gabeio In my Python code I am passing in an array and not a vector. In the Octave code theta is already a vector, but in Python it is not so I have to make it a vector — user2896120

Anton Alekseev Anton Alekseev · Accepted Answer · 2018-02-07T07:41:31

Look there:

>>> A = np.matrix([3,3,3])
>>> B = np.matrix([[1,1,1], [2,2,2]])
>>> A-B
matrix([[2, 2, 2],
        [1, 1, 1]])

Matrices are broadcasted together.

"it's because np.matrix inherits from np.array. np.matrix overrides multiplication, but not addition and subtraction"

In yours situation theta(1x2) subtract gradient(2x1) and in result you have got 2x2. Try to transpose gradient before subtracting.

theta = theta - gradient.T

Why is my gradient descent algorithm not working correctly?

1 Answers