Batch gradient descent for polynomial regression

Question

I am trying to move on from simple linear single-variable gradient descent into something more advanced: best polynomial fit for a set of points. I created a simple octave test script which allows me to visually set the points in a 2D space, then start the gradient dsecent algorithm and see how it gradually approaches the best fit.

Unfortunately, it doesn't work as good as it did with the simple single-variable linear regression: the results I get ( when I get them ) are inconsistent with the polynome I expect!

Here is the code:

dim=5;
h = figure();
axis([-dim dim -dim dim]);

hold on
index = 1;
data = zeros(1,2);
while(1)
    [x,y,b] = ginput(1);
    if( length(b) == 0 )
        break;
    endif
    plot(x, y, "b+");
    data(index, :) = [x y];
    index++;
endwhile

y = data(:, 2);
m = length(y);
X = data(:, 1);
X = [ones(m, 1), data(:,1), data(:,1).^2, data(:,1).^3 ];

theta = zeros(4, 1);

iterations = 100;
alpha = 0.001;
J = zeros(1,iterations);
for iter = 1:iterations
    theta -= ( (1/m) * ((X * theta) - y)' * X)' * alpha;

    plot(-dim:0.01:dim, theta(1) + (-dim:0.01:dim).*theta(2) + (-dim:0.01:dim).^2.*theta(3) + (-dim:0.01:dim).^3.*theta(4), "g-");

    J(iter) = sum( (1/m) * ((X * theta) - y)' * X);
end

plot(-dim:0.01:dim, theta(1) + (-dim:0.01:dim).*theta(2) + (-dim:0.01:dim).^2.*theta(3) + (-dim:0.01:dim).^3.*theta(4), "r-");

figure()
plot(1:iter, J);

I continuously get wrong results, even though it would seem that J is minimized correctly. I checked the plotting function with the normal equation ( which works correctly of course, and although I believe the error lies somewhere in the theta equation, I cannot figure out what it.

Konstantinos Monachopoulos Konstantinos Monachopoulos · Accepted Answer · 2015-01-08T10:38:43

i implemented your code and it seems to be just fine, the reason that you do not have the results that you want is that Linear regression or polynomial regression in your case suffers from local minimum when you try to minimize the objective function. The algorithm traps in local minimum during execution. i implement your code changing the step (alpha) and i saw that with smaller step it fits the data better but still you are trapping in local minimum.

Choosing random initialization point of thetas every time i am trapping in a different local minimum. If you are lucky you will find a better initial points for theta and fit the data better. I think that there are some algorithms that finds the best initial points.

Below i attach the results for random initial points and the results with Matlab's polyfit.

enter image description here

In the above plot replace "Linear Regression with Polynomial Regression", type error.

If you observe better the plot, you will see that by chance (using rand() ) i chose some initial points that leaded me to the best data fitting comparing the other initial points.... i am showing that with a pointer.

Batch gradient descent for polynomial regression

1 Answers