3
votes

I was trying out the sparse representation of the arff file as shown here. In my program I am able to print the the class label "B" but for some reason it is not printing "A".

    attVals = new FastVector();
    attVals.addElement("A");
    attVals.addElement("B");
    atts.addElement(new Attribute("class", attVals));

    vals[index] = attVals.indexOf("A");

The output for the program is like -

 {0 6,2 8}      ---  I should get {0 6,2 8,3 A}

But when I do

vals[index] = attVals.indexOf("B");

I get proper output -

 {0 6,2 8,3 B}

For some reason it is not taking the index 0. Can someone tell me why this is happening?

1

1 Answers

1
votes

This is a very popular problem. The Sparse format by definition does not store 0 values.

Weka ARFF format page clearly says that:

Warning: There is a known problem saving SparseInstance objects from datasets that have string attributes. In Weka, string and nominal data values are stored as numbers; these numbers act as indexes into an array of possible attribute values (this is very efficient). However, the first string value is assigned index 0: this means that, internally, this value is stored as a 0. When a SparseInstance is written, string instances with internal value 0 are not output, so their string value is lost (and when the arff file is read again, the default value 0 is the index of a different string value, so the attribute value appears to change). To get around this problem, add a dummy string value at index 0 that is never used whenever you declare string attributes that are likely to be used in SparseInstance objects and saved as Sparse ARFF files.

You have to put a dummy attribute in the first place. Just modify your code to:

attVals = new FastVector();
attVals.addElement("dummy");
attVals.addElement("A");
attVals.addElement("B");

Let me know if you need any further help.