5
votes

I have a set of files and each file contains a unique identifier. I use Weka programatically and I create a training arff file. Each instance in the arff file corresponds to a set of attributes that I have extracted from each file, so one instance per file. How can I link the identifier of each file with the corresponding instance in the arff file. Thank you very much in advance.

2
I don't understand the question completely - have you tried just adding a new feature that holds the identifier?kutschkem
Can you please add code snippets for more clarity ?Chris

2 Answers

4
votes

You can associate an identifier with each instance by creating an extra attribute, as described here.

Thus, in your case, you would create a string attribute and add this attribute to each instance. When training and testing your classifier, you will want to remove the identifier, which can easily be done using either the Remove Type Filter, which by default removes String types; the Remove Filter would also work, where you specify the attribute index.

If you are running Weka with the command line, you can use the -p option to output the predictions and attributes (even for attributes that are filtered); see the bottom of the first link.

1
votes

If I do not misunderstand, you want to link 2, or more, arff files together.

Let's assume that we have two arff files which are named as file1.arff and file2.aff.

You can use the following code via the command line:

java weka.core.Instances append file1.arff file2.arff

Cheers