2
votes

I'm using a dataset in Weka for classfication that includes missing values. As far as I understood, Weka replaces them automatically with the Modes or Mean of the training data (using the filter unsupervised/attribute/ReplaceMissingValues) when using a classifier like NaiveBayes.

I would like to try removing them, to see how this effects the quality of the classifier. Is there a filter to do that?

2

2 Answers

2
votes

See this answer below for a better, modern approach.

My approach is not the perfect one because IF you have more than 5 or 6 attributes then it becomes quite cumbersome to apply but I can suggest that MultiFilter should be used for this purpose if only a few attributes have missing values.

If you have missing values in 2 attributes then you'll use RemoveWithValues 2 times in a MultiFilter.

  1. Load your data in Weka Explorer
  2. Select MultiFilter from the Filter area
  3. Click on MultiFilter and Add RemoveWithValues
  4. Then configure each RemoveWithValues filter with the attribute index and select True in matchMissingValues
  5. Save the filter settings and click Apply in Explorer.
1
votes

Use the removeIf() method on weka.core.Instances using the method reference from weka.core.Instance for the hasMissingValue method, which returns a boolean if a given Instance has any missing values.

Instances dataset = source.getDataSet(); // for some source
dataset.removeIf(Instance::hasMissingValue);