I am trying to build a house price prediction model with sklearn linear regression and I am getting a negative score.
Please what am I doing wrong?
dataset:
Please see below details:
Shape of dataframe: (23435, 190)
Code:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import ShuffleSplit
from sklearn.model_selection import cross_val_score
properties_five = pd.read_csv('house_test.csv')
X = properties_five.drop('price', axis='columns')
y = properties_five['price']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=10)
lr_clf = LinearRegression()
lr_clf.fit(X_train, y_train)
print(lr_clf.score(X_train,y_train))
print(lr_clf.score(X_test,y_test))
cv = ShuffleSplit(n_splits=5, test_size=0.2, random_state=0)
print(cross_val_score(LinearRegression(), X, y, cv=cv))
score on training data: 0.0025884591059242013
score on test data : -1.6566338615525985e+24
