6
votes

Can anyone recommend a decision tree classifier implementation, in either Python or Java, that can be used incrementally?

All the implementations I've found require you to provide all the features to the classifier at once in order to get a classification. However, in my application, I have hundreds of features, and some of the features are functions that can take a long time to evaluate. Since not all branches of the tree may use all the features, it doesn't make sense to give the classifier all the features at once. I'd like the classifier to ask for features, one at a time, in the order that they are needed to make the maximum reduction in entropy and provide a final classification.

6
Note, this is a similar question: stackoverflow.com/questions/3411279/…Cerin

6 Answers

3
votes

I believe there is no such implementation, but decision trees are so simple to implement that you shouldn't have any problems with writing such program on your own.
On the other hand I don't think the idea of counting features on the fly can increase speed, because even if some feature was used to make some previous split, it still must be considered on the rest of them, so for many records it would be recalculated many times (it may save memory though). This could make sense in case of random forest, where only a random, limited subset of features is considered on each split -- still RF is usable only as a classifier, it won't build you nice, human-interpretable decision trees.

2
votes

Usually such packages (J48 trees in Weka in particular) allows you to specify a missing value in place of a feature value, which will be dealt with the same way C4.5 algorithm would do:

when we reach the node splitting by that attribute with missing value, we send the instance down each possible branch weighted proportional to the number of training instances going down those branches, eventually accumulating the result at the leaf nodes.

Of course you could have a more aggressive approach and change the way the tree chooses the attributes to split by during the training phase. A naive way would be assign weights to attributes and to multiply the splitting criterion (entropy, information gain, ..) by that weight as a penalty coefficient, so that "expensive attributes" would be less likely to be chosen as a splitting node.

0
votes

Are you concerned about this during training time or at classification time? Since those periods are separate you can play tricks to avoid evaluating it until it's very late if it's the later. There are no trick you can play during training time. You have to provide all the features at the time of training. However, since that can happen outside your program you don't have to be as concerned about calculation time. Training the tree is the most intensive part.

So I'd suggest get all your data together, train it, take the product from training then using lazy evaluation in your objects as you send them down the tree. Have your objects implement some interface for getting values that you can use code to lazy evaluate things. If an object never hits a node requiring an expensive value then you don't have to evaluate it.

Your expensive calculations might not be chosen as picks to split on so then you eliminate the need to evaluate them at classification time. You'll probably only have 3-5 features that are statistically relevant once you train and prune your tree. Then you can optimize only those features with caching and such so classification is fast.

You if you want incremental training that's a whole another ball of wax, and there are algorithms for that. But, they aren't explained very well, and you'll have to dig into research papers to get them.

0
votes

So here is what I'd do. Given the answer to my previous question I think you have something like the following. Sounds like you want to implement some sort of 20 questions like approach.

With twenty questions you have yes/no answers so a binary tree works best. However, you could layer in multiple choice options, but the user picks one choice. So this algorithm assumes you've trained your tree ahead of time and it's been built from a dataset you wish to use.

Say for example we're trying to do a medical diagnosis so our data might look like the following:

Disease Name  Head Ache   Fever  Back Pain  Leg Pain  Blurry Vision  Hearing Loss
Common Cold   Yes         Yes    No         No        No             No
Migraine      Yes         No     No         No        Yes            No
Herpes        No          Yes    No         No        No             No

In this example, Head Ache, Fever, Back Pain, Leg Pain, etc are the influencers, and Disease Name is the target. Each row would be an actual diagnosis of a single patient so a disease could be repeated in the data more than once.

  1. Modify a walk algorithm to start at the root.
  2. If you've reached a leaf tell the user the potential answers.
  3. Take the influencer used to split this node and present it to the user and ask the "Yes/No" question (Do you have a Head Ache).
  4. Go left if the user answers Yes.
  5. Go Right if the user answers No.
  6. Goto Step 2

In the leaf nodes you'll have to actual rows that made it to that location so you can display it to the user saying you might have one of these:

Head ache Migraine Severed Head

Prescription is: blah blah blah.

With 1 million influencers it will take a while to build the tree. If you wanted to lower that it might be possible to use multi-valued influencers instead of yes/no. Although it's really hard to think of 1 million yes/no unique questions even for every medical condition. Once you build the tree it can offer as many diagnosis as you want.

0
votes

The Python decision tree package at https://pypi.python.org/pypi/DecisionTree has an interactive mode for using a decision tree. It is not incremental in the sense you need. However, it may be possible to easily change the code in the function for interactive operation to allow you to see the results incrementally.

0
votes

see gaenari

it is c++ incremental decision tree algorithm.
it minimize concept drifting damage.

it is based on C++, but can be extended to python and java.