0
votes

I'm using J48 to classify instances composed of both numeric and nominal values. My problem is that I don't know which nominal-value I'll come across during my program. Therefor I need to update my nominal-attribute's data of the model "on the fly".

For instance, say I have only 2 attribute, occupation and age and the run is as followed: OccuptaionAttribute = {}.


input: [Piano teacher, 22].

OccuptaionAttribute = {Piano teacher}.


input: [school teacher, 30]

OccuptaionAttribute = {Piano teacher, school teacher}.


input: [Piano teacher, 40]

OccuptaionAttribute = {Piano teacher, school teacher}.


etc.

Now I've try to do so manually by copying the previous attributes, adding the new attribute and then updating the model's data. That works fine when training the model.

But! when I want to classify a new instance, say [SW engineer, 52], OccuptaionAttribute was updated: OccuptaionAttribute = {Piano teacher, school teacher, SW engineer}, but the tree itself never "met" "SW engineer" before so the classification cannot be fulfilled and an Exception is thrown.

Can you direct how to handle the above situation? Does Weka has any mechanism supporting the above issue?

Thanks!

1

1 Answers

1
votes

When training add a placeholder data to your nominal-attributes like __other__. Before trying to classify an instance first check whether the value of nominal attribute is seen before; if its not use the placeholder value:

Attribute attribute = instances.attribute("OccuptaionAttribute");
String s = "SW engineer";
int index = attribute.indexOfValue(s);
if (index == -1) {
    index = attribute.indexOfValue("__other__");
}

When you have enough data train again with the new values.