I did find threads about scikit catagorical variables. But I could not find a easy answer. I do realise while building decision tree, sklearn errors out for catagorical data and there are suggestions for Vectorizer etc. I tried evrything yet I am not able to create a decision tree. My table has a lot of columns with strings and I tried vectorizer,multilabelbinerizer etc. Nothings seenms to work. I am not able to export_graphviz and display the tree, as there is no tree at all. I am pretty new to this. I sincerely request to help me understand how to handle these columns. I am splitting the data 80-20 for training and test. Then I am trying to build a tree. Just a quick piece of code:
dtree=DecisionTreeClassifier(random_state=0)
mlb = preprocessing.MultiLabelBinarizer()
n_train = mlb.fit_transform(train)
n_test = mlb.transform(test)
dec_tree=dtree.fit(n_train,n_test)
I do get this as answer and I am confused:
DecisionTreeClassifier(class_weight=None, criterion='gini',
max_depth=None,
max_features=None, max_leaf_nodes=None, min_samples_leaf=1,
min_samples_split=2, min_weight_fraction_leaf=0.0,
random_state=0, splitter='best')
Please advise on how to proceed.