I am trying to run gradient descent and cannot get the same result as octaves built-in fminunc, when using exactly the same data
My Code is
%for 5000 iterations
for iter = 1:5000
%%Calculate the cost and the new gradient
[cost, grad] = costFunction(initial_theta, X, y);
%%Gradient = Old Gradient - (Learning Rate * New Gradient)
initial_theta = initial_theta - (alpha * grad);
end
Where costFunction calucates the cost and gradient, when given an example (X,y) and parameters(theta).
a built-in octave function fminunc also calling costFunction and with the same data finds a much much better answer in far fewer iterations.
Given that octave uses the same cost function i assume the costFunction is correct.
I have tried decreasing the learning rate in case i am hitting a local minima and increasing the number of iterations, the cost stops decreasing so i think it seems that it has found the minimum, but the final theta still has a much larger cost and is no where near as accurate
even if fminunc is using a better alogoritm hould gradient descent eventually find the same answer with enough iterations and a smaller learning rate?
or can anyone see if i am doing anything wrong?
Thank you for any and all help.