I am trying to implement gradient descent algorithm to minimize a cost function for multiple linear algorithm. I am using the concepts explained in the machine learning class by Andrew Ng. I am using Octave. However when I try to execute the code it seems to fail to provide the solution as my theta values computes to "NaN". I have attached the cost function code and the gradient descent code. Can someone please help.
Cost function :
function J = computeCostMulti(X, y, theta)
m = length(y); % number of training examples
J = 0;
h=(X*theta);
s= sum((h-y).^2);
J= s/(2*m);
Gradient Descent Code:
function [theta, J_history] = gradientDescentMulti(X, y, theta, alpha, num_iters)
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);
for iter = 1:num_iters
a= X*theta -y;
b = alpha*(X'*a);
theta = theta - (b/m);
J_history(iter) = computeCostMulti(X, y, theta);
end