0
votes

I'm trying to visualize how a decision tree applies to a test set, and I'm using Weka's J48 decision tree. Weka identifies each sample with a class by evaluating the decision tree and reaching a leaf. Of course, multiple leaves are tagged with the same class. Does anybody know how to get Weka to tell, for each sample, which leaf it used to tag that sample?

X < 47
|  Y > 10  : Class1 (...)
|  Y <= 10 : Class2 (...)
X >= 47
|  Y > 15  : Class1 (...)
|  Y <= 15
|  |  Z > 10  : Class2 (...)
|  |  Z <= 10 : Class1 (...)

I'd like something that would say "sample 15 was classified as Class1 because (X>=47, Y<=15, Z<=10)" or something like that.

Alternatively, I'd like something that says "27 samples were classified as Class1 because (X>=47, Y<=15, Z<=10)".

Alter-alternatively, does anybody know of instances where someone has visualized this info, or of other software that does spit this info out? Thanks.

1

1 Answers

0
votes

I'd still like a real answer if anyone knows one, but my answer is that Weka does not contain this ability. My solution is to make a thing that does what I need. It's available here:

GitHub:DecisionTreeDNA

I'm not done with it yet. It's going to build a cool graph with the numbers, but it already spits out those numbers I wanted. Oh - the numbers it spits out are of the "27 samples were classified as Class1 because (X>=47, Y<=15, Z<=10)" variety, but it's easy to modify to spit out "sample 15 was classified as Class1 because (X>=47, Y<=15, Z<=10)".