1
votes

I'm just starting out learning machine learning and have been trying to fit a polynomial to data generated with a sine curve. I know how to do this in closed form, but I'm trying to get it to work with gradient descent too. However, my weights explode to crazy heights, even with a very large penalty term. What am I doing wrong? Here is the code:

import numpy as np
import matplotlib.pyplot as plt
from math import pi

N = 10
D = 5

X = np.linspace(0,100, N)
Y = np.sin(0.1*X)*50
X = X.reshape(N, 1)


Xb = np.array([[1]*N]).T
for i in range(1, D):
    Xb = np.concatenate((Xb, X**i), axis=1)

#Randomly initializie the weights
w = np.random.randn(D)/np.sqrt(D)

#Solving in closed form works
#w = np.linalg.solve((Xb.T.dot(Xb)),Xb.T.dot(Y))
#Yhat = Xb.dot(w)

#Gradient descent
learning_rate = 0.0001
for i in range(500):
    Yhat = Xb.dot(w)
    delta = Yhat - Y
    w = w - learning_rate*(Xb.T.dot(delta) + 100*w)

print('Final w: ', w)
plt.scatter(X, Y)
plt.plot(X,Yhat)
plt.show()

Thanks!

1

1 Answers

1
votes

When updating theta, you have to take theta and subtract it with the learning weight times the derivative of theta divided by the training set size. You also have to divide your penality term by the training size set. But the main problem is that your learning rate is too large. For future debugging, it is helpful to print the cost to see if gradient descent is working and if the learning rate is too small or just right.

Below here is the code for 2nd degree polynomial which the found the optimum thetas (as you can see the learning rate is really small). I've also added the cost function.

N = 2
D = 2

#Gradient descent
learning_rate = 0.000000000001
for i in range(200):
    Yhat = Xb.dot(w)
    delta = Yhat - Y
    print((1/N) * np.sum(np.dot(delta, np.transpose(delta))))

    w = w - learning_rate*(np.dot(delta, Xb)) * (1/N)