1
votes

I am trying to run gradient descent and cannot get the same result as octaves built-in fminunc, when using exactly the same data

My Code is

%for 5000 iterations
for iter = 1:5000

%%Calculate the cost and the new gradient
[cost, grad] = costFunction(initial_theta, X, y);


%%Gradient = Old Gradient - (Learning Rate * New Gradient)
initial_theta = initial_theta - (alpha * grad);

end 

Where costFunction calucates the cost and gradient, when given an example (X,y) and parameters(theta).

a built-in octave function fminunc also calling costFunction and with the same data finds a much much better answer in far fewer iterations.

Given that octave uses the same cost function i assume the costFunction is correct.

I have tried decreasing the learning rate in case i am hitting a local minima and increasing the number of iterations, the cost stops decreasing so i think it seems that it has found the minimum, but the final theta still has a much larger cost and is no where near as accurate

even if fminunc is using a better alogoritm hould gradient descent eventually find the same answer with enough iterations and a smaller learning rate?

or can anyone see if i am doing anything wrong?

Thank you for any and all help.

1
Just want to point out, lowering the learning rate will do nothing to prevent your algorithm from hitting a local optima. A larger learning rate might successfully jump over a very small one but it's very unlikely. You need to make sure the differential of the function you're optimising is convex, so it has only one optima (the global one)Jonathon Ashworth

1 Answers

2
votes

Your comments are wrong, but the algorithm is good.

In gradient descent it's easy to fall into numerical problems, then I suggest to perform feature normalization.

Also, if you're unsure about your learning rate, try to adjust it dynamically. Something like:

best_cost = Inf;
best_theta = initial_theta;
alpha = 1;

for iter = 1:500
  [cost, grad] = costFunction(best_theta, X_reg, y);

  if (cost < best_cost)
    best_theta = best_theta - alpha * grad;
    best_cost = cost;
  else
    alpha = alpha * 0.99
  end
end

Moreover remember that different answers can give the same decision boundaries. For example for hypothesis h(x) = x(0) + theta(1) * x(1) + theta(2) * x(2) these answers give the same boundary:

theta = [5, 10, 10];
theta = [10, 20, 20];