Gradient Descent implementation in python?

Question

I have tried to implement gradient descent and it was working properly when I tested it on sample dataset but it's not working properly for boston dataset.

Can you verify what's wrong with the code. why I'm not getting a correct theta vector?

import numpy as np
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split

X = load_boston().data
y = load_boston().target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
X_train1 = np.c_[np.ones((len(X_train), 1)), X_train]
X_test1 = np.c_[np.ones((len(X_test), 1)), X_test]

eta = 0.0001
n_iterations = 100
m = len(X_train1)
tol = 0.00001

theta = np.random.randn(14, 1)

for i in range(n_iterations):
    gradients = 2/m * X_train1.T.dot(X_train1.dot(theta) - y_train)
    if np.linalg.norm(X_train1) < tol:
        break
    theta = theta - (eta * gradients)

I'm getting my weight vector in the shape of (14, 354). What am I doing wrong here?

Artem Trunov Artem Trunov · Accepted Answer · 2019-03-17T21:03:57

Consider this (unrolled some statements for better visibility):

for i in range(n_iterations):
    y_hat = X_train1.dot(theta)
    error = y_hat - y_train[:, None]
    gradients = 2/m * X_train1.T.dot(error)

    if np.linalg.norm(X_train1) < tol:
        break
    theta = theta - (eta * gradients)

since y_hat is (n_samples, 1) and y_train is (n_samples,) - for your example n_samples is 354 - you need to bring y_train to the same dimension with a dummy axis trick y_train[:, None].

Gradient Descent implementation in python?

2 Answers