2
votes

I have tried to implement gradient descent and it was working properly when I tested it on sample dataset but it's not working properly for boston dataset.

Can you verify what's wrong with the code. why I'm not getting a correct theta vector?

import numpy as np
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split

X = load_boston().data
y = load_boston().target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
X_train1 = np.c_[np.ones((len(X_train), 1)), X_train]
X_test1 = np.c_[np.ones((len(X_test), 1)), X_test]

eta = 0.0001
n_iterations = 100
m = len(X_train1)
tol = 0.00001

theta = np.random.randn(14, 1)

for i in range(n_iterations):
    gradients = 2/m * X_train1.T.dot(X_train1.dot(theta) - y_train)
    if np.linalg.norm(X_train1) < tol:
        break
    theta = theta - (eta * gradients)

I'm getting my weight vector in the shape of (14, 354). What am I doing wrong here?

2

2 Answers

1
votes

Consider this (unrolled some statements for better visibility):

for i in range(n_iterations):
    y_hat = X_train1.dot(theta)
    error = y_hat - y_train[:, None]
    gradients = 2/m * X_train1.T.dot(error)

    if np.linalg.norm(X_train1) < tol:
        break
    theta = theta - (eta * gradients)

since y_hat is (n_samples, 1) and y_train is (n_samples,) - for your example n_samples is 354 - you need to bring y_train to the same dimension with a dummy axis trick y_train[:, None].

1
votes

y_train here is a 1-dimensional NP array (ndim=1) whereas X_train1.dot(theta) is a 2-D NP array (ndim=2). When you do subtraction, y_train gets broadcasted to the same dimension as the other. To address this you could convert the y_train also to a 2-D array. You can do this by y_train.reshape(-1,1).

for i in range(n_iterations):
gradients = 2/m * X_train1.T.dot(X_train1.dot(theta) - y_train.reshape(-1,1))
if np.linalg.norm(X_train1) < tol:
    break
theta = theta - (eta * gradients)