1
votes

I am working on KDD99 dataset using WEKA. There are three types of attributes in the dataset, which are Nominal, Binary and Numeric. But in WEKA, it considers Binary data also as Numeric.

I tried to use Unsupervised-attribute-Normalize tool to normalize the data. However, it also normalize the binary data. I have two question here.

  1. Do I need to normalize the Binary attributes? Because binary data is not continuous.

  2. If I do not need to normalize the binary attributes, in WEKA, how I can select attributes in Normalize tool? Because the Normalize tool always applies to all the numeric attribute(including the binary attribute).

Thanks!

1

1 Answers

1
votes

Weka has interpreted the binary attributes from your input file as numeric because their values are all numbers (i.e. 0 and 1), but if you're going to use classifiers that can handle nominal attributes you probably want to convert the binary attributes into nominal ones instead.

You can do this with the weka.filters.unsupervised.attribute.Discretize filter. Just specify the numeric indices of the attributes that are binary and specify the number of bins to be 2.

This will give you attributes with nominal value labels of (-inf-0.5] and (0.5-inf), but if you'd rather see them as 0 and 1 you can rename the values using weka.filters.unsupervised.attribute.RenameNominalValues.