Linear regression implementation always performs worse than sklearn

Question

I implemented linear regression with gradient descent in python. To see how well it is doing I compared it with scikit-learn's LinearRegression() class. For some reason, sklearn always outperforms my program by a MSE of 3 on average (I am using the Boston Housing dataset for testing). I understand that I am currently not doing gradient checking to check for convergence, but I am allowing for many iterations and have set the learning rate low enough such that it SHOULD converge. Is there any clear bug in my learning algorithm implementation? Here is my code:

import numpy as np
from sklearn.linear_model import LinearRegression

def getWeights(x):
    lenWeights = len(x[1,:]);
    weights = np.random.rand(lenWeights)
    bias = np.random.random();
    return weights,bias

def train(x,y,weights,bias,maxIter):
    converged = False;
    iterations = 1;
    m = len(x);
    alpha = 0.001;
    while not converged:
            for i in range(len(x)):
                # Dot product of weights and training sample
                hypothesis = np.dot(x[i,:], weights) + bias;
                # Calculate gradient
                error = hypothesis - y[i];
                grad = (alpha * 1/m) * ( error * x[i,:] );
                # Update weights and bias
                weights = weights - grad;
                bias = bias - alpha * error;
                iterations = iterations + 1;

                if iterations > maxIter:
                    converged = True;
                    break

    return weights, bias

def predict(x, weights, bias):
    return np.dot(x,weights) + bias

if __name__ == '__main__':

    data = np.loadtxt('housing.txt');
    x = data[:,:-1];
    y = data[:,-1];
    for i in range(len(x[1,:])):
        x[:,i] = ( (x[:,i] - np.min(x[:,i])) / (np.max(x[:,i]) - np.min(x[:,i])) );

    initialWeights,initialBias = getWeights(x);
    weights,bias = train(x,y,initialWeights,initialBias,55000);
    pred = predict(x, weights,bias);
    MSE = np.mean(abs(pred - y));

    print "This Program MSE: " + str(MSE)

    sklearnModel = LinearRegression();
    sklearnModel = sklearnModel.fit(x,y);
    sklearnModel = sklearnModel.predict(x);

    skMSE = np.mean(abs(sklearnModel - y));

    print "Sklearn MSE: " + str(skMSE)

user1149913 user1149913 · Accepted Answer · 2014-02-08T06:44:35

First, make sure that you are computing the correct objective function value. The linear regression objective should be .5*np.mean((pred-y)**2), rather than np.mean(abs(pred - y)).

You are actually running a stochastic gradient descent (SGD) algorithm (running a gradient iteration on individual examples), which should be distinguished from "gradient descent".

SGD is a good learning method, but a bad optimization method - it can take many iterations to converge to a minimum of the empirical error (http://leon.bottou.org/publications/pdf/nips-2007.pdf).

For SGD to converge, the learning rate must be restricted. Typically, the learning rate is set to the base learning rate divided by the number of iterations, something like alpha/(iterations+1), using the variables in your code.

You also include a multiple of 1/m in your gradient, which is typically not used in SGD updates.

To test your SGD implementation, rather than evaluating the error on the dataset that you trained with, split the dataset into a training set and a test set, and evaluate the error on this test set after training with both methods. The training/test set split will allow you to estimate the performance of your algorithm as a learning algorithm (estimate the expected error) rather than as an optimization algorithm (minimize the empirical error).

Linear regression implementation always performs worse than sklearn

2 Answers