1
votes

We are using python 2.7.12 with weka 3.9.3 api (and java 1.8.0191) trying to classify using j48 classifier below a short piece of the python code:

loader = Loader(classname="weka.core.converters.ArffLoader")
data = loader.load_file(data_dir + "new_ALL_FEATURES.arff")
data.class_is_last()

#seperate the data to train and test:
removeRange = 
Filter(classname="weka.filters.unsupervised.instance.RemoveRange", options= 
["-R","4951-last"])
removeRange.inputformat(data)
train = removeRange.filter(data)

removeRange = 
Filter(classname="weka.filters.unsupervised.instance.RemoveRange", options= 
["-R","first-4951"])
removeRange.inputformat(data)
test = removeRange.filter(data)
cls = Classifier(classname="weka.classifiers.trees.J48", options=["-C", 
"0.25"])
cls.build_classifier(train)

Additional details options used for the classification: pruned tree, no cross-validation, the data includes 23 numeric features (the class is nominal), missing values were replaced. When running the weka GUI using the same file and the same classifier with the same options (["-C", "0.25", "-M", "2"]) The classification results are different than the results we get from the API: The tree structure is different the GUI tree includes 77 leaves while the tree built by the API includes 97 leaves.

We have searched for a similar problem and found the following link: Different results in Weka GUI and Weka via Java code However, it is not relevant to us since we are not performing cross-validation.

What could be the cause of this difference? What are we missing? Please advise, thanks in advance.

1

1 Answers

0
votes

UPDATE: We found the problem. Apparently when we split the data via API we missed one sample and that caused the difference.