I'm trying to apply what I've learned in Andrew Ng's Coursera course. I've successfully implemented this same algorithm the same way I'm doing it here on the Kaggle Titanic Dataset, but now with this data (UFC fights) I'm getting a negative cost. I've stripped the dataset down to only two features (opponent and the round the fight ended in), then took their zscore.
This is my design matrix: (it's actually much bigger but I get the same negative cost when it's this small)
array([[ 1. , -0.50373455, -0.35651205],
[ 1. , -1.54975476, 0.84266484],
[ 1. , 0.63737841, -1.55568894],
[ 1. , 1.11284214, 0.84266484],
[ 1. , -1.07429103, 0.84266484],
[ 1. , -1.07429103, -1.55568894],
[ 1. , 0.25700742, 0.84266484],
[ 1. , -1.83503301, -0.35651205],
[ 1. , 1.20793489, -0.35651205],
[ 1. , 1.58830588, -1.55568894],
[ 1. , -1.16938378, 0.84266484],
[ 1. , -0.78901279, -0.35651205],
[ 1. , -0.50373455, -1.55568894],
[ 1. , 1.0177494 , -0.35651205],
[ 1. , -0.21845631, 0.84266484],
[ 1. , 0.92265665, -1.55568894],
[ 1. , 0.06682193, 0.84266484],
[ 1. , 1.30302764, -0.35651205],
[ 1. , 0.44719292, -0.35651205],
[ 1. , -0.69392004, 0.84266484],
[ 1. , 1.39812038, -1.55568894],
[ 1. , -0.97919828, 0.84266484],
[ 1. , 0.16191468, 0.84266484],
[ 1. , -1.54975476, 0.84266484],
[ 1. , -0.02827082, 0.84266484],
[ 1. , 0.63737841, -0.35651205],
[ 1. , -0.88410554, 0.84266484],
[ 1. , 0.06682193, 0.84266484],
[ 1. , -1.73994026, 0.84266484],
[ 1. , -0.12336356, 0.84266484],
[ 1. , -0.97919828, 0.84266484],
[ 1. , 0.8275639 , -1.55568894],
[ 1. , 0.73247116, 0.84266484],
[ 1. , 1.68339863, -1.55568894],
[ 1. , 0.35210017, -1.55568894],
[ 1. , -0.02827082, 0.84266484],
[ 1. , 1.30302764, 0.84266484]])
My weights vector is initialized to all zeros:
array([[0.],
[0.],
[0.]])
For completeness, here's the Y vector:
array([[0],
[0],
[1],
[1],
[0],
[0],
[1],
[0],
[0],
[1],
[0],
[0],
[1],
[0],
[1],
[0],
[1],
[0],
[1],
[1],
[0],
[1],
[1],
[0],
[0],
[1],
[1],
[1],
[1],
[0],
[0],
[1],
[1],
[1],
[1],
[0],
[1]], dtype=uint8)
This is my cost function and sigmoid / predict functions:
def cost_function(X, Y, theta):
m = len(Y)
h = predict(X,theta)
cost = (np.dot((-Y.T), np.log(h)) - np.dot((1-Y).T, np.log(1-h))) / m
return cost
def sigmoid(z):
return 1/(1+np.e**(-z))
def predict(X, theta):
z = np.dot(X, theta)
return sigmoid(z)
Here's the gradient descent function:
def gradient_descent(X, Y, theta, rate):
m = len(Y)
h = predict(X, theta)
gradient = rate * np.dot(X.T, (h-Y)) / m
theta -= gradient
return theta
Then I use this train
function to call both over n iterations.
def train(X, Y, theta, rate, iters):
cost_history = []
for i in range(iters):
theta = gradient_descent(X, Y, theta, rate)
cost = cost_function(X, Y, theta)
cost_history.append(cost)
if i % 100 == 0:
print("iter: " + str(i) + " cost: " + str(cost))
return theta, cost_history
Then at the end of this I end up with a cost function that looks like this:
That's what I'm having trouble understanding. Why is it that it's negative? Is it a problem with the code, or the data, or is this how it's supposed to work and I'm missing something? I've been trying for the last day to figure it out but haven't gotten anywhere. With just these features it still correctly predicts the outcome of the fight about 54% of the time in the test set using the weights after it's been trained using the above functions, but the cost is negative.