I'm dealing with the famous Kaggle challenge "House prices". I want to train my Dataset with sklearn.linear_model LinearRegression
After reading the following article: https://developers.google.com/machine-learning/crash-course/representation/feature-engineering
I wrote a function converting all String values in my train DataFrame into Lists. For example an original feature values might look like this [Ex, Gd, Ta, Po] and after the conversion it will look like this: [1,0,0,0] [0,1,0,0] [0,0,1,0] [0,0,0,1].
When I try to train my data I get the following Error:
Traceback (most recent call last): File "C:/Users/Owner/PycharmProjects/HousePrices/main.py", line 27, in linereg.fit(train_df, target) File "C:\Users\Owner\PycharmProjects\HousePrices\venv\lib\site-packages\sklearn\linear_model\base.py", line 458, in fit y_numeric=True, multi_output=True) File "C:\Users\Owner\PycharmProjects\HousePrices\venv\lib\site-packages\sklearn\utils\validation.py", line 756, in check_X_y estimator=estimator) File "C:\Users\Owner\PycharmProjects\HousePrices\venv\lib\site-packages\sklearn\utils\validation.py", line 567, in check_array array = array.astype(np.float64) ValueError: setting an array element with a sequence.
This only happens when I convert some columns as I explained.
Is there any way to train a Linear-Regression model with vectors as values?
This is my conversion function:
def feature_to_boolean_vector(df, feature_name, new_name):
vectors_list = [] #each tuple will represent an option
feature_options = df[feature_name].unique()
feature_options_length = len(feature_options)
# creating a list the size of feature_options_length, all 0's
list_to_be_vector = [0 for i in range(feature_options_length)]
for i in range(feature_options_length):
list_to_be_vector[i] = 1 # inserting 1 representing option number i
vectors_list.append(list_to_be_vector.copy())
list_to_be_vector[i] = 0
mapping = dict(zip(feature_options, vectors_list)) # dict from values to vectors
df[new_name] = df[feature_name].map(mapping)
df.drop([feature_name], axis=1, inplace=True)
And this is my train attempt (after pre-processing):
linereg = LinearRegression()
linereg.fit(train_df, target)
Thank you in advance.