Exception with duplicate label when transform data from Numeric to Nominal via python-weka-wrapper v0.3.10

Question

There's a data pre-processing problem with using the python-weka-wrapper v0.3.10 that I'd confusing for a couple of days.

I'm using the create_instances_from_matrices() to generate my dataset from ndarray to instance, which is numeric type.

And then save the dataset into an arff file(numeric_data.arff) via

Saver(classname="weka.core.converters.ArffSaver")

Then tried to transform my dataset to nominal type with

Filter(classname="weka.filters.unsupervised.attribute.NumericToNominal", options=["-R", "first-last"])

The exception message output like:

Exception in thread "Thread-0" java.lang.IllegalArgumentException: A nominal attribute (x2) cannot have duplicate labels (1).

However, with the same dataset(numeric_data.arff) I generated, it can be transform to nominal type via Weka GUI Explorer v3.8.1 successfully.

I'll be appreciated with any idea can help.

Thanks!

fracpete fracpete · Accepted Answer · 2017-04-16T20:44:25

The problem is most likely that you have small values (< 10^6), which Weka all turns to 0.0 when saving (Weka only outputs 6 digits after the decimal point by default). If you apply your filter to the dataset before you save it, it should work.

Alternatively, you can tell the ArffSaver how many decimals you would like to use when saving the file (-decimal option). See also the Javadoc of the ArffSaver class.

Exception with duplicate label when transform data from Numeric to Nominal via python-weka-wrapper v0.3.10

2 Answers