I am trying to work with the titanic survival challenge in kaggle https://www.kaggle.com/c/titanic.
I am not experienced in R so i am using Python and Scikit Learn for the Random Forest Classifier
I am seeing many people using scikit learn converting their categorical of many levels into dummy variables.
I don't understand the point of doing this, why can't we just map the levels into a numeric value and be done with it.
And also i saw someone do the following: There was a categorical feature Pclass with three levels, he created 3 dummy variables for this and dropped the variable which had the least survival rate. I couldn't understand this either, i though decision trees didn't care about correlated features.