2
votes

I want to implement simple classification tree (binary classification) using Mathematica.

How can I implement a binary tree in Mathematica? Is there is a symbol for doing that?

3
Leonid is the expert on data structures in Mma here. I remember he posted at least two good answers, but now I can find only one stackoverflow.com/questions/6097071/…. Try searching his answersDr. belisarius

3 Answers

2
votes

I'd say it depends on what you want to do with the data structure.

You can exploit the fact that Mathematica expressions themselves are trees.

If only the leaf nodes are relevant, then use nested lists, e.g. {{1, {2, 3}}, 4}. If the other node need to carry some data too, then you can use something like this:

tree[1][tree[2][a, b], tree[3][c, tree[4][d, e]]]

See the structure like this:

{{1, {2, 3}}, 4} // TreeForm
tree[1][tree[2][a, b], tree[3][c, tree[4][d, e]]] // TreeForm

The next question is how to implement algorithm X on such a data structure.

2
votes

Among the new objects in MMA 8 are TreeGraph, CompleteKaryTree, and KaryTree. The latter two objects give binary trees by default. I don't know how efficient they are for intensive computation but they do seem well-suited for displaying classifications. And there are many predicates and options for manipulating and displaying them.

Here's an example of a classification tree from [Breiman, L. Classification and Regression Trees: Chapman & Hall/CRC, 1984.]. It concerns 3 questions to determine whether a cardiac patient is likely to die within 30 days if not treated.

KaryTree[9, 2, 
   VertexLabels -> {1 -> "Blood pressure > 91 ?", 2 -> "Age > 62.5?", 
                    4 -> "Sinus tachycardia ?", 8 -> "< 30 days"}, 
   EdgeLabels -> {1 \[UndirectedEdge] 2 -> "yes", 
                  1 \[UndirectedEdge] 3 -> "no", 2 \[UndirectedEdge] 4 -> "yes", 
                  2 \[UndirectedEdge] 5 -> "no", 4 \[UndirectedEdge] 8 -> "yes", 
                  4 \[UndirectedEdge] 9 -> "no"}, ImagePadding -> 20]

Classification graph

I'd like to get rid of the two unused nodes on the right, but have not figured out a an elegant way to do it. So I think I'll post a simple question about that on SO.

1
votes

Personally I don't quite know, but there appears to be an article about the very subject on the Wolfram site, found here. It might not answer your question, but it will hopefully give you some insight!