I wrote this two code implementations to compute the gradient delta for the regularized logistic regression algorithm, the inputs are a scalar variable n1 that represents a value n+1, a column vector theta of size n+1, a matrix X of size [m x (n+1)], a column vector y of size m and a scalar factor lambda.
The first code computes successfully, the second one outputs a wrong result. I believe these implementation are doing the same thing how can they output different results?
%correct
tmp = zeros(n1, 1);
tmp(2:n1,:) = (lambda / m) * theta(2:n1);
grad = (1 / m) * (X' * (sigmoid(X * theta) - y)) + tmp;
%wrong
grad(1,:) = (1 / m) * (X(:,1)' * (sigmoid(X(:,1) * theta(1,:)) - y));
grad(2:n1,:) = (1 / m) * (X(:,2:n1)' * (sigmoid(X(:,2:n1) * theta(2:n1,:)) - y)) + ((lambda / m) * theta(2:n1));
Where sigmoid(z) returns g as in:
g = zeros(size(z));
g = 1 ./ (1 + exp( -z ));