2
votes

I am trying to implement gradient descent algorithm to minimize a cost function for multiple linear algorithm. I am using the concepts explained in the machine learning class by Andrew Ng. I am using Octave. However when I try to execute the code it seems to fail to provide the solution as my theta values computes to "NaN". I have attached the cost function code and the gradient descent code. Can someone please help.

Cost function :

function J = computeCostMulti(X, y, theta)

m = length(y); % number of training examples

J = 0;

h=(X*theta);
s= sum((h-y).^2);
J= s/(2*m);

Gradient Descent Code:

function [theta, J_history] = gradientDescentMulti(X, y, theta, alpha, num_iters)

m = length(y); % number of training examples
J_history = zeros(num_iters, 1);

for iter = 1:num_iters

  a= X*theta -y;
  b = alpha*(X'*a);
  theta = theta - (b/m);

  J_history(iter) = computeCostMulti(X, y, theta);  
end
2
Sure will do as you suggestedGaurav Chordiya

2 Answers

2
votes

I implemented this algorithm in GNU Octave and I separated this into 2 different functions, first you need to define a gradient function

function [thetaNew] = compute_gradient (X, y, theta, m)
    thetaNew = (X'*(X*theta'-y))*1/m;
end

then to compute the gradient descent algorithm use a different function

function [theta] = gd (X, y, alpha, num_iters)
    theta = zeros(1,columns(X));
    for iter = 1:num_iters,
        theta = theta - alpha*compute_gradient(X,y,theta,rows(y))';                
    end
end

Edit 1 This algorithm works for both multiple linear regression (multiple independent variable) and linear regression of 1 independent variable, I tested this with this dataset

age height  weight
41  62  115
21  62  140
31  62  125
21  64  125
31  64  145
41  64  135
41  72  165
31  72  190
21  72  175
31  66  150
31  66  155
21  64  140

For this example we want to predict

predicted weight = theta0 + theta1*age + theta2*height

I used these input values for alpha and num_iters

alpha=0.00037
num_iters=3000000

The output of runing gradient descent for this experiment is as follows:

theta =
-170.10392    -0.40601     4.99799

So the equation is

predicted weight = -170.10392 - .406*age + 4.997*height

This is almost absolute minimum of the gradient, since the true results for this problem if using PSPP (open source alternative of SPSS) are

predicted weight = -175.17 - .40*age + 5.07*height

Hope this helps to confirm the gradient descent algorithm works same for multiple linear regression and standard linear regression

1
votes

I did found the bug and it was not either in the logic of the cost function or gradient descent function. But indeed in the feature normilization logic and I was accidentally returning the wrong varible and hence it was cauing the output to be "NaN"

It is dumb mistake :

What I was doing previously

mu= mean(a);
sigma = std(a);
b=(X.-mu);
X= b./sigma;

Instead what I shoul be doing

function [X_norm, mu, sigma] = featureNormalize(X)
%FEATURENORMALIZE Normalizes the features in X 
%   FEATURENORMALIZE(X) returns a normalized version of X where
%   the mean value of each feature is 0 and the standard deviation
%   is 1. This is often a good preprocessing step to do when
%   working with learning algorithms.

% You need to set these values correctly
X_norm = X;
mu = zeros(1, size(X, 2));
sigma = zeros(1, size(X, 2));

% ====================== YOUR CODE HERE ======================


mu= mean(X);
sigma = std(X);
a=(X.-mu);
X_norm= a./sigma;

% ============================================================

end

So clearly I should be using X_norm insated of X and that is what cauing the code to give wrong output