Why am I getting a negative cost function for logistic regression using gradient descent in python?

Question

I'm trying to apply what I've learned in Andrew Ng's Coursera course. I've successfully implemented this same algorithm the same way I'm doing it here on the Kaggle Titanic Dataset, but now with this data (UFC fights) I'm getting a negative cost. I've stripped the dataset down to only two features (opponent and the round the fight ended in), then took their zscore.

This is my design matrix: (it's actually much bigger but I get the same negative cost when it's this small)

array([[ 1.        , -0.50373455, -0.35651205],
   [ 1.        , -1.54975476,  0.84266484],
   [ 1.        ,  0.63737841, -1.55568894],
   [ 1.        ,  1.11284214,  0.84266484],
   [ 1.        , -1.07429103,  0.84266484],
   [ 1.        , -1.07429103, -1.55568894],
   [ 1.        ,  0.25700742,  0.84266484],
   [ 1.        , -1.83503301, -0.35651205],
   [ 1.        ,  1.20793489, -0.35651205],
   [ 1.        ,  1.58830588, -1.55568894],
   [ 1.        , -1.16938378,  0.84266484],
   [ 1.        , -0.78901279, -0.35651205],
   [ 1.        , -0.50373455, -1.55568894],
   [ 1.        ,  1.0177494 , -0.35651205],
   [ 1.        , -0.21845631,  0.84266484],
   [ 1.        ,  0.92265665, -1.55568894],
   [ 1.        ,  0.06682193,  0.84266484],
   [ 1.        ,  1.30302764, -0.35651205],
   [ 1.        ,  0.44719292, -0.35651205],
   [ 1.        , -0.69392004,  0.84266484],
   [ 1.        ,  1.39812038, -1.55568894],
   [ 1.        , -0.97919828,  0.84266484],
   [ 1.        ,  0.16191468,  0.84266484],
   [ 1.        , -1.54975476,  0.84266484],
   [ 1.        , -0.02827082,  0.84266484],
   [ 1.        ,  0.63737841, -0.35651205],
   [ 1.        , -0.88410554,  0.84266484],
   [ 1.        ,  0.06682193,  0.84266484],
   [ 1.        , -1.73994026,  0.84266484],
   [ 1.        , -0.12336356,  0.84266484],
   [ 1.        , -0.97919828,  0.84266484],
   [ 1.        ,  0.8275639 , -1.55568894],
   [ 1.        ,  0.73247116,  0.84266484],
   [ 1.        ,  1.68339863, -1.55568894],
   [ 1.        ,  0.35210017, -1.55568894],
   [ 1.        , -0.02827082,  0.84266484],
   [ 1.        ,  1.30302764,  0.84266484]])

My weights vector is initialized to all zeros:

array([[0.],
   [0.],
   [0.]])

For completeness, here's the Y vector:

array([[0],
       [0],
       [1],
       [1],
       [0],
       [0],
       [1],
       [0],
       [0],
       [1],
       [0],
       [0],
       [1],
       [0],
       [1],
       [0],
       [1],
       [0],
       [1],
       [1],
       [0],
       [1],
       [1],
       [0],
       [0],
       [1],
       [1],
       [1],
       [1],
       [0],
       [0],
       [1],
       [1],
       [1],
       [1],
       [0],
       [1]], dtype=uint8)

This is my cost function and sigmoid / predict functions:

def cost_function(X, Y, theta):
    m = len(Y)
    h = predict(X,theta)
    cost = (np.dot((-Y.T), np.log(h)) - np.dot((1-Y).T, np.log(1-h))) / m
    return cost

def sigmoid(z):
    return 1/(1+np.e**(-z))

def predict(X, theta):
    z = np.dot(X, theta)
    return sigmoid(z)

Here's the gradient descent function:

def gradient_descent(X, Y, theta, rate):
    m = len(Y)
    h = predict(X, theta)

    gradient = rate * np.dot(X.T, (h-Y)) / m
    theta -= gradient
    return theta

Then I use this train function to call both over n iterations.

def train(X, Y, theta, rate, iters):
    cost_history = []

    for i in range(iters):
        theta = gradient_descent(X, Y, theta, rate)

        cost = cost_function(X, Y, theta)
        cost_history.append(cost)

        if i % 100 == 0:
            print("iter: " + str(i) + " cost: " + str(cost))
    return theta, cost_history

Then at the end of this I end up with a cost function that looks like this:

That's what I'm having trouble understanding. Why is it that it's negative? Is it a problem with the code, or the data, or is this how it's supposed to work and I'm missing something? I've been trying for the last day to figure it out but haven't gotten anywhere. With just these features it still correctly predicts the outcome of the fight about 54% of the time in the test set using the weights after it's been trained using the above functions, but the cost is negative.

i suck at programming i suck at programming · Accepted Answer · 2019-04-16T01:02:06

Okay after some more troubleshooting I found the problem. I'm not sure why it's causing the problem, but fixing it puts my cost function back to normal.

So the Y vector's dtype is uint8, and that apparently causes problems somewhere down the line. Changing it to int64 fixed everything. Sorry I don't know why it's causing the problem, but if I find out I'll edit it into my answer.

Why am I getting a negative cost function for logistic regression using gradient descent in python?

1 Answers