Y intercept not changing in linear regression gradient descent

Question

I'm currently learning about gradient descent so I wrote a piece of code that uses gradient descent with linear regression. The line I get however is not the best line. I calculated the error of both linear regression with gradient descent and least squares errors regression. No matter what data I use the least squares errors always gives me a much lower error. I decided to look at the slope and y intercepts both are coming up with. The y intercept on the one using gradient descent is always very close to zero,as if it isn't properly changing. I find this pretty strange and I have no idea what is going on. Am I somehow implementing gradient descent incorrectly?

import matplotlib.pyplot as plt
datax=[]
datay=[]
def gradient(b_current,m_current,learningRate):
    bgradient=0
    mgradient=0
    N=float(len(datax))
    for i in range(0,len(datax)):
        bgradient+= (-2/N)*(datay[i]-((m_current*datax[i])+b_current))
        mgradient+= (-2/N)*datax[i]*(datay[i]-((m_current*datax[i])+b_current))
    newb=b_current-(bgradient*learningRate)
    newm=m_current-(mgradient*learningRate)
    return newm,newb
def basic_linear_regression(x, y):
    # Basic computations to save a little time.
    length = len(x)
    sum_x = sum(x)
    sum_y = sum(y)

    # sigma x^2, and sigma xy respectively.
    sum_x_squared = sum(map(lambda a: a * a, x))
    sum_of_products = sum([x[i] * y[i] for i in range(length)])

    # Magic formulae!  
    a = (sum_of_products - (sum_x * sum_y) / length) / (sum_x_squared - ((sum_x ** 2) / length))
    b = (sum_y - a * sum_x) / length
    return a, b

def error(m,b,datax,datay):
    error=0
    for i in range(0,len(datax)):
        error+=(datay[i]-(m*datax[i]+b))
    return error/len(datax)
def run():
    m=0
    b=0
    iterations=1000
    learningRate=.00001
    for i in range(0,iterations):
        m,b=gradient(b,m,learningRate)

    print(m,b)
    c,d=basic_linear_regression(datax,datay)
    print(c,d)
    gradientdescent=error(m,b,datax,datay)
    leastsquarederrors=error(c,d,datax,datay)
    print(gradientdescent)
    print(leastsquarederrors)
    plt.scatter(datax,datay)
    plt.plot([0,300],[b,300*m+b])
    plt.axis('equal')
    plt.show()

run()

Vash Vash · Accepted Answer · 2017-09-16T06:40:10

I have seen learning rate is sometimes taken in the range of 0.01. This could probably be the reason why you would need more than 1000 iterations since your learning rate is 0.00001, unless your dataset is small. Lesser the learning rate more the iterations required to converge.

Another thing I noticed is that you are fixing the number of iterations. You can never tell if your cost function is going to be at/near global minimum at 1000th iteration. Especially with such a low learning rate, what if you need more than 1000 iterations? To deal with this - try using a while loop and inside this loop, add a calculation of the difference in cost function(delta J) and keep looping till (delta J < Threshold) where Threshold is generally kept very low(in the range of 0.01 or 0.001). Then take the cost function once you break out of the while loop and compare it with that obtained from the OLS method.

Y intercept not changing in linear regression gradient descent

1 Answers