My dataset has 32 categorical variable, and one numerical continous variable(sales_volume)
First I transformed categorical variables to binary with one-hot encoding (pd.get_dummies) and now I have 1294 columns since every column has several categorical variable.
Now I want to reduce them before using any dimensional reduction techniques.
What is the best option to select the most effective variables?
For example; one categorical variable has two answers 'yes' and 'no'. Is it possible to 'yes' column has significant importance and 'no' column has nothing to explain? Would you drop the question('yes' and 'no' columns) or just 'no' column?
Thanks in advance.