I found this thread from 2014 and the answer states that no, sklearn random forest classifier cannot handle categorical variables (or at least not directly). Has the answer changed in 2020?
I want to feed gender as a feature for my model. However, gender can take on three values: M, F of np.nan. If I encode this column into three dichotomous columns, how can the random forest classifier know that these three columns represent a single feature?
Imagine max_features = 7. When training a given tree, it will randomly randomly pick seven features. Suppose gender was chosen. If gender is split into three columns (gender_M, gender_F, gender_NA), will the random forest classifier always pick all three columns and count it as one feature, or is there a chance that it will only pick one or two?