I have a column_names in the list and I want to One-Hot encode values from the columns in list . I want to encode categorical variable from the dataset.I tried few procedures but it throws me an error
from sklearn import preprocessing
#training_set_ed is where my .csv file is stored
edited_training_set = 'edited_dataset/test_set.csv'
trainig_set_ed = pd.read_csv(edited_training_set)
column_header = ['cat_var_1','cat_var_2','cat_var_3','cat_var_4','cat_var_5','cat_var_6',
'cat_var_7','cat_var_8','cat_var_9','cat_var_10','cat_var_11','cat_var_12','cat_var_13',
'cat_var_14','cat_var_15','cat_var_16','cat_var_17','cat_var_18']
clfs = {c:LabelEncoder() for c in column_header}
for col,clf in clfs.items():
trainig_set_ed[col] = clfs[col].fit_transform(trainig_set_ed[col])
trainig_set_ed.to_csv('edited_dataset/train_set_encode.csv',sep='\t',encoding='utf-8')
error it throws
Traceback (most recent call last): File "preprocessing.py", line 83, in trainig_set_ed[col] = clfs[col].fit_transform(trainig_set_ed[col]) File "/root/.local/lib/python2.7/site-packages/pandas/core/frame.py", line 2139, in getitem return self._getitem_column(key) File "/root/.local/lib/python2.7/site-packages/pandas/core/frame.py", line 2146, in _getitem_column return self._get_item_cache(key) File "/root/.local/lib/python2.7/site-packages/pandas/core/generic.py", line 1842, in _get_item_cache values = self._data.get(item) File "/root/.local/lib/python2.7/site-packages/pandas/core/internals.py", line 3838, in get loc = self.items.get_loc(item) File "/root/.local/lib/python2.7/site-packages/pandas/core/indexes/base.py", line 2524, in get_loc return self._engine.get_loc(self._maybe_cast_indexer(key)) File "pandas/_libs/index.pyx", line 117, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 139, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 1265, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 1273, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'cat_var_6'
Thanks !
training_set_ed
is actually astr
, not a pandas object. Can you share the code you’re using to create it? – dantiston