Let's assume that I have a pandas dataframe with the following column names:
'age'(e.g. 33, 26, 51 etc)'seniority'(e.g. 'junior', 'senior' etc)'gender'(e.g. 'male', 'female')'salary'(e.g. 32000, 40000, 64000 etc)
I want to transform the seniority categorical variables to one hot encoded values. For this reason I am doing the following:
from sklearn.preprocessing import LabelEncoder
label_encoder = LabelEncoder()
data['seniority'] = label_encoder.fit_transform(data['seniority'])
from sklearn.preprocessing import OneHotEncoder
one_hot_encoder = OneHotEncoder(categorical_features=[1])
data = one_hot_encoder.fit_transform(data.values)
But then I am getting this error
ValueError: could not convert string to float: 'gender'
at line
data = one_hot_encoder.fit_transform(data.values)
However, I have explicitly specified that categorical_features=[1] so only column 1 (seniority) should be considered for this one hot encoding.
How can I fix this error (except for example by dropping the column 'gender')?
I was using pandas.get_dummies in the past and I did not have this problem.