Gradient Descent failing for multiple variables, results in NaN

Question

I am trying to implement gradient descent algorithm to minimize a cost function for multiple linear algorithm. I am using the concepts explained in the machine learning class by Andrew Ng. I am using Octave. However when I try to execute the code it seems to fail to provide the solution as my theta values computes to "NaN". I have attached the cost function code and the gradient descent code. Can someone please help.

Cost function :

function J = computeCostMulti(X, y, theta)

m = length(y); % number of training examples

J = 0;

h=(X*theta);
s= sum((h-y).^2);
J= s/(2*m);

Gradient Descent Code:

function [theta, J_history] = gradientDescentMulti(X, y, theta, alpha, num_iters)

m = length(y); % number of training examples
J_history = zeros(num_iters, 1);

for iter = 1:num_iters

  a= X*theta -y;
  b = alpha*(X'*a);
  theta = theta - (b/m);

  J_history(iter) = computeCostMulti(X, y, theta);  
end

Jose Luis Soto Posada Jose Luis Soto Posada · Accepted Answer · 2018-08-17T23:31:03

I implemented this algorithm in GNU Octave and I separated this into 2 different functions, first you need to define a gradient function

function [thetaNew] = compute_gradient (X, y, theta, m)
    thetaNew = (X'*(X*theta'-y))*1/m;
end

then to compute the gradient descent algorithm use a different function

function [theta] = gd (X, y, alpha, num_iters)
    theta = zeros(1,columns(X));
    for iter = 1:num_iters,
        theta = theta - alpha*compute_gradient(X,y,theta,rows(y))';                
    end
end

Edit 1 This algorithm works for both multiple linear regression (multiple independent variable) and linear regression of 1 independent variable, I tested this with this dataset

age height  weight
41  62  115
21  62  140
31  62  125
21  64  125
31  64  145
41  64  135
41  72  165
31  72  190
21  72  175
31  66  150
31  66  155
21  64  140

For this example we want to predict

predicted weight = theta0 + theta1*age + theta2*height

I used these input values for alpha and num_iters

alpha=0.00037
num_iters=3000000

The output of runing gradient descent for this experiment is as follows:

theta =
-170.10392    -0.40601     4.99799

So the equation is

predicted weight = -170.10392 - .406*age + 4.997*height

This is almost absolute minimum of the gradient, since the true results for this problem if using PSPP (open source alternative of SPSS) are

predicted weight = -175.17 - .40*age + 5.07*height

Hope this helps to confirm the gradient descent algorithm works same for multiple linear regression and standard linear regression

Gradient Descent failing for multiple variables, results in NaN

2 Answers