1
votes

Can I use OneHotEncoder to one hot encode only a select values in a category column instead of constructing a dense ndarray and then dropping the unnecessary columns?

For example, If a categorical column named color has 3 colors say red, green and blue and I would like to one hot encode only 2 of those values say "red" and "green", how would I do that with OneHotEncoder?

I am new to ML, so also would like to know is it advisable and common thing to do such limited one hot encoding especially dealing with column having way too many categorical values?

Thanks in advance.

1
Are you only dropping column or are you dropping multiple columns? - Jeff
You either use one-hot encoding on a column using all values else not do it at all. If you are trying to one-hot encode some vales and keep some vales as they are, I don't see a way in which you would be able to use them in ML. - Aniket Bote

1 Answers

1
votes

If you're only looking to drop one of the categories in each column so that you're fitting against a baseline, you can add a drop attribute at the OneHotEncoder object initialization.

This will drop the first value encountered in the column

ohe = OneHotEncoder(drop='first')

If you have 2 columns you want to encode and have specific values you want to not encode in each column:

dropList = ['Blue','Triangle']
ohe = OneHotEncoder(drop=dropList)

Will not encode 'Blue' in the first column and not encode 'Triangle' in the second column.