Logistic Regression with regularization in python failing to minimize

Question

I'm implementing logistic regression based on the Coursera documentation, both in python and Octave. In Octave, I managed to do it and achieve the right training accuracy, but in python, since I don't have access to fminunc, I cannot figure out a work around.

Currently, this is my code:

df = pandas.DataFrame.from_csv('ex2data2.txt', header=None, index_col=None)
df.columns = ['x1', 'x2', 'y']
y = df[df.columns[-1]].as_matrix()
m = len(y)
y = y.reshape(m, 1)
X = df[df.columns[:-1]]
X = X.as_matrix()

from sklearn.preprocessing import PolynomialFeatures

feature_mapper = PolynomialFeatures(degree=6)
X = feature_mapper.fit_transform(X)

def sigmoid(z):
    return 1/(1+np.power(np.e, z))

def cost_function_reg(theta):
    _theta = theta.copy().reshape(-1, 1)
    shifted_theta = np.insert(_theta[1:], 0, 0)
    h = sigmoid(np.dot(X, _theta))
    reg = (_lambda / (2.0*m))* shifted_theta.T.dot(shifted_theta)
    J = ((1.0/m)*(-y.T.dot(np.log(h)) - (1 - y).T.dot(np.log(1-h)))) + reg
    return J

def gradient(theta):
    _theta = theta.copy().reshape(-1, 1)
    shifted_theta = np.insert(_theta[1:], 0, 0)
    h = sigmoid(np.dot(X, _theta))
    gradR = _lambda*shifted_theta
    gradR.shape = (gradR.shape[0], 1)
    grad = (1.0/m)*(X.T.dot(h-y)+gradR)
    return grad.flatten()

from scipy.optimize import *
theta = fmin_ncg(cost_f, initial_theta, fprime=gradient)
predictions = predict(theta, X)
accuracy = np.mean(np.double(predictions == y)) * 100
print 'Train Accuracy: %.2f' % accuracy

The output is:

Warning: Desired error not necessarily achieved due to precision loss.
         Current function value: 0.693147
         Iterations: 0
         Function evaluations: 22
         Gradient evaluations: 12
         Hessian evaluations: 0
Train Accuracy: 50.85

In octave, the accuracy is: 83.05.

Any help is appreciated.

I'd try using several different optimizers, docs.scipy.org/doc/scipy/reference/optimize.html, and see if, say, bfgs performs better than ncg — ev-br
I tried bfgs without suplying fprime and it gives similar results (~50% accuracy). With fprime (the gradient function there) I get an error. — Rafael Barros
if you're importing sklearn already, why not just use its logistic regression classifier to do the work for you? Or is this more of a learning exercise? — Simon
This is a learning exercise. I want to implement my own to better use the tool in the future. — Rafael Barros

Rafael Barros Rafael Barros · Accepted Answer · 2015-08-27T22:05:48

There were two problems on that implementation:

The first one, fmin_ncg is not ideal for that minimization. I have used it on the previous exercise, but it was failing to find the theta with that gradient function, which is ideal to the one in Octave.

Switching to

theta = fmin_bfgs(cost_function_reg, initial_theta)

Fixed that issue.

The second issue was that the accuracy was being miscalculated. Once I optimized with fmin_bfgs, and achieved the cost that matched the Octave results (0.529), the (predictions == y) part had different shapes ((118, 118) and (118,1)) , yielding a matrix that was MxM instead of vector.

Logistic Regression with regularization in python failing to minimize

1 Answers