I'm trying to get Weka to predict from the command line, but I'm concerned I might be doing this wrong. I read the Data Mining book and searched their site for documentation, yet what I found was vague at best, so I hope you can help me.
First, I created a training set (train.arff). Here's a sample:
@relation test
@attribute 'A' {0,1}
@attribute 'B' {0,1}
@attribute 'C' {0,1}
@attribute 'D' {0,1}
@attribute 'E' {0,1}
@attribute 'F' {0,1}
@data
0,0,0,0,0,0
0,0,0,0,0,0
...
Then I created data set to be completed by prediction (test.arff):
@relation test
@attribute 'A' {0,1}
@attribute 'B' {0,1}
@attribute 'C' {0,1}
@attribute 'D' {0,1}
@attribute 'E' {0,1}
@attribute 'F' {0,1}
@data
0,?,0,0,0,0
0,?,0,0,0,0
...
The "?" marks the attribute that should be predicted.
Finally, I attempted to get the predictions by running this on the command line:
java weka.classifiers.trees.J48 -t train.arff -T test.arff -p 0
It produces the following output:
=== Predictions on test data ===
inst# actual predicted error prediction
1 2:1 2:1 0.939
2 2:1 2:1 0.939
I then took the number after the ":" in the predicted column for the prediction for the data row marked by inst#.
Here are my questions:
Is this correct? I'm concerned about "?" as I read that it may be imputed (although that may be only during the learning phase).
Does Weka support multiple predictions? No matter how many fields are marked with "?" I always get the same table with only one predicted value per instance.
Can Weka generate a complete (predicted) ARFF file, or do I have to construct this myself from its results?
If I missed something glaringly obvious, apologies in advance and any pointers to relevant documentation would be greatly appreciated.
Thanks in advance!