1
votes

I wrote this two code implementations to compute the gradient delta for the regularized logistic regression algorithm, the inputs are a scalar variable n1 that represents a value n+1, a column vector theta of size n+1, a matrix X of size [m x (n+1)], a column vector y of size m and a scalar factor lambda.

The first code computes successfully, the second one outputs a wrong result. I believe these implementation are doing the same thing how can they output different results?

%correct
tmp = zeros(n1, 1);
tmp(2:n1,:) = (lambda / m) * theta(2:n1);
grad = (1 / m) * (X' * (sigmoid(X * theta) - y)) + tmp;

%wrong
grad(1,:) = (1 / m) * (X(:,1)' * (sigmoid(X(:,1) * theta(1,:)) - y));
grad(2:n1,:) = (1 / m) * (X(:,2:n1)' * (sigmoid(X(:,2:n1) * theta(2:n1,:)) - y)) + ((lambda / m) * theta(2:n1));

Where sigmoid(z) returns g as in:

g = zeros(size(z));
g = 1 ./ (1 + exp( -z ));
1

1 Answers

1
votes

The problem lies in matrix multiplication.

First suppose, m = 5 and n1 = 5, This means X is a 5*5 matrix and both theta and y is a vector of 5 elements.

Now in the first case the sigmoid function provide a 5*5 matrix and the inverse of X is also 5*5. Since it is a matrix multiplication, 1st row of X' multiply with the 1st column of g which means you need all the g values to calculate the first row of grad.

Now in the second case, for the first row of grad, the sigmoid function also provide a 5*5 matrix but it's different because now X is a 1*5 matrix. This means the matrix multiplication will provide a different output and hence the result is different.

I hope this is clear now.

**previously I was writing my answer assuming that theta and y is row vector, but in you example you have clearly mentioned that you are using column vector. However, still matrix multiplication is the problem. If you have a clear understanding about matrix multiplication, then you will easily understand the problem.

Lets start with the right equation,

( X * theta) - y = m*1 matrix, hence the sigmoid is m*1 matrix.

X' * sigmoid is the main part here, because the other two terms are scalar, X' * sigmoid = m*1 matrix and finally your grad is m*1 matrix.

If you look closely the grad(1,1) is dependent upon X'(1,:) and sigmoid, and you have calculated the sigmoid using all the theta values. So, grad(1,1) is not only depends on theta(1) and you can't simply replace theta with theta(1) what's you are doing in the wrong case.