1
votes

I have a column(actually couple of columns) that have mixed kind of data: categorical and numerical data(correspond to different category). However, they should just be categorical in nature. My final goal is to give them a one-hot-encoding representation.

The numerical values are mostly zeros in the column. I want to convert this column to categorical. Since I don't know a direct way(mixed to one-hot-encoding) of converting using get_dummies(). Therefore, I first converted to numerical completely, then converted to one-hot-encoding.

The image below represents my scenario.

mixed_column

Is there a better approach? Is there a way to directly convert the data to categorical.

Any help is appreciated.

1
do the numerical values correspond to different categories? or should it all be considered under a single category? a more elaborate example input + desired output would help a lot.Adam.Er8
The numerical values correspond to different category.Eswar

1 Answers

0
votes

The code below correctly one-hot encodes a column with integer and categorical values. This uses the most direct way with get_dummies(). If it doesn't work for you, then consider using another library for categorical encoding.

import pandas as pd
data = {'Column 1':[1,2,'a']}
df = pd.DataFrame(data)
print(pd.get_dummies(df, columns=['Column 1']))

Output:

   Column 1_1  Column 1_2  Column 1_a
0           1           0           0
1           0           1           0
2           0           0           1
​