1
votes

I am working through Andrew Ng's Machine Learning on Coursera by implementing all the code in python rather than MATLAB.

In Programming Exercise 3, I implemented my regularized logistic regression cost function in a vectorized form:

def compute_cost_regularized(theta, X, y, lda):
    reg =lda/(2*len(y)) * np.sum(theta**2) 
    return 1/len(y) * np.sum(-y @ np.log(sigmoid(X@theta)) 
                         - (1-y) @ np.log(1-sigmoid(X@theta))) + reg

On the following test inputs:

theta_test = np.array([-2,-1,1,2])
X_test = np.concatenate((np.ones((5,1)), 
         np.fromiter((x/10 for x in range(1,16)), float).reshape((3,5)).T), axis = 1)
y_test = np.array([1,0,1,0,1])
lambda_test = 3

the above cost function outputs 3.734819396109744. However, according to the skeleton MATLAB code provided to us, the correct output should be 2.534819. I'm puzzled because I cannot find anything wrong with my cost function but I believe it has a bug. In fact, I've also implemented it in Programming Exercise 2 in the binary classification case and it works fine, giving a result close to the expected value.

I thought that one reason could be that I've constructed my *_test input arrays wrongly based on misinterpreting the provided skeleton MATLAB code which are:

theta_t = [-2; -1; 1; 2];
X_t = [ones(5,1) reshape(1:15,5,3)/10];
y_t = ([1;0;1;0;1] >= 0.5);
lambda_t = 3;

However, I had ran them through an Octave interpreter to see what they actually are, and ensure that I could match them exactly in python.

Furthermore, the computation of gradient based on these inputs using my own vectorized and regularized gradient function is also correct. Lastly, I decided to just proceed with the computation and examine the prediction results. The accuracy of my predictions were way lower than the expected accuracy, so it gives all the more reason to suspect that something is wrong with my cost function that is making everything else wrong.

Help please! Thank you.

1

1 Answers

3
votes

If you recall from regularization, you do not regularize the bias coefficient. Not only do you set the gradient to zero when performing gradient descent but you do not include this in the cost function. You have a slight mistake where you are including this as part of the sum (see cell #18 on your notebook that you linked - the sum should start from j = 1 but you have it as j = 0). Therefore, you need to sum from the second element to the end for your theta, not the first. You can verify this on Page 9 of the ex2.pdf PDF assignment that is seen on your Github repo. This explains the inflated cost as you are including the bias unit as part of the regularization.

Therefore, when computing regularization in reg, index theta so that you start from the second element and onwards:

def compute_cost_regularized(theta, X, y, lda):
    reg =lda/(2*len(y)) * np.sum(theta[1:]**2) # Change here
    return 1/len(y) * np.sum(-y @ np.log(sigmoid(X@theta)) 
                         - (1-y) @ np.log(1-sigmoid(X@theta))) + reg

Once I do this, define your test values as well as define your sigmoid function, I get the right answer that you're expecting:

In [8]: def compute_cost_regularized(theta, X, y, lda):
   ...:     reg =lda/(2*len(y)) * np.sum(theta[1:]**2)
   ...:     return 1/len(y) * np.sum(-y @ np.log(sigmoid(X@theta))
   ...:                          - (1-y) @ np.log(1-sigmoid(X@theta))) + reg
   ...:

In [9]: def sigmoid(z):
   ...:     return 1 / (1 + np.exp(-z))
   ...:

In [10]: theta_test = np.array([-2,-1,1,2])
    ...: X_test = np.concatenate((np.ones((5,1)),
    ...:          np.fromiter((x/10 for x in range(1,16)), float).reshape((3,5)).T), axis = 1)
    ...: y_test = np.array([1,0,1,0,1])
    ...: lambda_test = 3
    ...:

In [11]: compute_cost_regularized(theta_test, X_test, y_test, lambda_test)
Out[11]: 2.5348193961097438