Feature eliminaton of dummy variables

Question

I have several categorical variables with high number of classes. I used one-hot encoding in order to convert them into 1-0 format.

original:

column_1    column_2
0.8            X        
0.3            C        
0.9            D        
1.2            C

one-hot encoded:

column_1    column_2_X   column_2_C  column_2_D  
0.8            1            0           0
0.3            0            1           0
0.9            0            0           1
1.2            0            1           0

Then I checked feature_importances of them.

For example column_2_C has no importance to model, but others which share the same category(A) has significant importance.

In this case or any other case(%50 of the classes have high importance %50 of them are very low) what should I do? What if column_2_C has crucially significant but others (X and D) has no importance at all?

What happens if I remove that class? Any best practice for this kind of case?

Thanks in advance,

Schilker Schilker · Accepted Answer · 2019-10-04T12:03:11

If you are using the dummy variables in a model, then removing the non-significant variables or non-confounders is appropriate. However, if you are retaining the original categorical variable you should not delete those observations from your sample. I would need more information regarding what you are doing.

Feature eliminaton of dummy variables

1 Answers