How multi label classification works in scikit-learn decision tree?

Question

I had a problem to classify inputs which have more than one label. So problem is multi-label classification. I used scikit-learn Decision Tree classifiers to do this and it gives pretty good results at initial stages. But, I am wondering how is it working under the hood and How the split is done in Decision Tree for multi-label classification? The important question is about how one model which is initialized once can be trained with two different class of labels at the same time? How the Decision Tree model will solve out the optimization task for both different sets of labels?

@PV8 I know it gives the more or less same result but the question is about how one model which is initialized once can be trained on two different class of labels at the same time? — Urvish

Roee Anuar Roee Anuar · Accepted Answer · 2019-09-03T15:29:31

Under the hood, each node in your decision tree has the same labels as the root node, however, the probability of each label is different. When you run model.predict(), the model gives the prediction as the label with the highest probability. You can use model.predict_proba() to see the probability for each label separately. You can use this code to get the probabilities correctly:

all_probs=pd.DataFrame(model.predict_proba(X_test),columns=model.classes_)

How multi label classification works in scikit-learn decision tree?

1 Answers