1
votes

I have a couple of attributes with missing values.

This is a survey, so the fact that the person refused to answer is, by itself, useful information!

I would like to create a new attribute called is-missing-value = 1 if a given value in an attribute is a missing value and 0 otherwise.

Things I have tried:

  • I have tried using AddExpression, but this seems to only perform arithmetic operations such as 2*attribute.
  • I know that MathExpression allows using if-elses, such as ifelse(A < 3.0, 1, 0)... Do you guys know if/how I can test if a value is nan?
  • MakeIndicator (or NominalToBinary) should be able to do what I want, but I think I need (i) to convert my missing values to a nominal value, so that then (ii) I can convert this new nominal value to binary. The problem is that ReplaceMissingValue only works for mode or mean; I need to be able to define a new value. One solution could be to Edit the data directly, but I'd rather avoid this.

Please notice that I need to do this using the Weka GUI, not the Java interface.

1

1 Answers

0
votes

I think I have a solution for you:

  1. copy the attribute (if you want the original one to remain): apply the copy filter (this and the following filters are all under unsupervised/attribute folder) with the index of the attribute
  2. Convert your attribute to nominal using the numericToNominal filter (set the attribute index)
  3. Fill the missing values with a new value using ReplaceMissingWithUserConstant. Here you need to specify the nominalStringReplacementValue parameter (e.g. "missing") in addition to the index of your attribute.
  4. Apply the NominalToBinary filter on your attribute. This will create several new attributes (as the number of unique values in the dataset + the missing value). You can remove the attributes you don't need and keep only the missing attribute.

Hope it helped.