I am currently taking Machine Learning on the Coursera platform and I am trying to implement Logistic Regression. To implement Logistic Regression, I am using gradient descent to minimize the cost function and I am to write a function called costFunctionReg.m that returns both the cost and the gradient of each parameter evaluated at the current set of parameters.
The problem is better described below:


My cost function is working, but the gradient function is not. Please note that I would prefer to implement this using looping, rather than element-by-element operations.
I am computing theta[0] (in MATLAB, theta(1)) separately as it is not being regularized, i.e. we do not use the first term (with lambda).
function [J, grad] = costFunctionReg(theta, X, y, lambda)
%COSTFUNCTIONREG Compute cost and gradient for logistic regression with regularization
% J = COSTFUNCTIONREG(theta, X, y, lambda) computes the cost of using
% theta as the parameter for regularized logistic regression and the
% gradient of the cost w.r.t. to the parameters.
% Initialize some useful values
m = length(y); % number of training examples
n = length(theta); %number of parameters (features)
% You need to return the following variables correctly
J = 0;
grad = zeros(size(theta));
% ====================== YOUR CODE HERE ======================
% Instructions: Compute the cost of a particular choice of theta.
% You should set J to the cost.
% Compute the partial derivatives and set grad to the partial
% derivatives of the cost w.r.t. each parameter in theta
% ----------------------1. Compute the cost-------------------
%hypothesis
h = sigmoid(X * theta);
for i = 1 : m
% The cost for the ith term before regularization
J = J - ( y(i) * log(h(i)) ) - ( (1 - y(i)) * log(1 - h(i)) );
% Adding regularization term
for j = 2 : n
J = J + (lambda / (2*m) ) * ( theta(j) )^2;
end
end
J = J/m;
% ----------------------2. Compute the gradients-------------------
%not regularizing theta[0] i.e. theta(1) in matlab
j = 1;
for i = 1 : m
grad(j) = grad(j) + ( h(i) - y(i) ) * X(i,j);
end
for j = 2 : n
for i = 1 : m
grad(j) = grad(j) + ( h(i) - y(i) ) * X(i,j) + lambda * theta(j);
end
end
grad = (1/m) * grad;
% =============================================================
end
What am I doing wrong?
0to the left of X which represents the offset term that you don't have to regularize. Now you are applying the non-regularization term to the first feature (first column), not to the offset term. Try doingX=[zeros(size(X,1),1) X]- Sembei Norimaki