0
votes

I'm attempting to implement gradient descent using code from :

Gradient Descent implementation in octave

I've amended code to following :

X = [1; 1; 1;] 
y = [1; 0; 1;]
m = length(y);
X = [ones(m, 1), data(:,1)]; 
theta = zeros(2, 1);        
iterations = 2000;
alpha = 0.001;

for iter = 1:iterations
     theta = theta -((1/m) * ((X * theta) - y)' * X)' * alpha;
end

theta

Which gives following output :

X =

   1
   1
   1

y =

   1
   0
   1

theta =

   0.32725
   0.32725

theta is a 1x2 Matrix but should'nt it be 1x3 as the output (y) is 3x1 ?

So I should be able to multiply theta by the training example to make a prediction but cannot multiply x by theta as x is 1x3 and theta is 1x2?

Update :

%X = [1 1; 1 1; 1 1;] 
%y = [1 1; 0 1; 1 1;]

X = [1 1 1; 1 1 1; 0 0 0;] 
y = [1 1 1; 0 0 0; 1 1 1;]

m = length(y);
X = [ones(m, 1), X]; 
theta = zeros(4, 1);     
theta

iterations = 2000;
alpha = 0.001;

for iter = 1:iterations
     theta = theta -((1/m) * ((X * theta) - y)' * X)' * alpha;
end

%to make prediction
m = size(X, 1); % Number of training examples
p = zeros(m, 1);
htheta = sigmoid(X * theta);
p = htheta >= 0.5;
1

1 Answers

2
votes

You are misinterpreting dimensions here. Your data consists of 3 points, each having a single dimension. Furthermore, you add a dummy dimension of 1s

X = [ones(m, 1), data(:,1)]; 

thus

octave:1> data = [1;2;3]
data =

   1
   2
   3

octave:2> [ones(m, 1), data(:,1)]
ans =

   1   1
   1   2
   1   3

and theta is your parametrization, which you should be able to apply through (this is not a code, but math notation)

h(x) = x1 * theta1 + theta0

thus your theta should have two dimensions. One is a weight for your dummy dimension (so called bias) and one for actual X dimension. If your X has K dimensions, theta would have K+1. Thus, after adding a dummy dimension matrices have following shapes:

X is 3x2
y is 3x1
theta is 2x1

so

X * theta  is 3x1 

the same as y