1
votes

My Weka OneR models are all returning what seems like an overfit set, concluding with a question mark leading to a certain results like so:

FollowersMeanCoords_Col:
    < 0.33340000000000003   -> False
    >= 0.33340000000000003  -> True
    ?   -> False
(114357/163347 instances correct)

Is this OneR simply saying "I can't find anything, so we assume the rest is false"? But then, why is there a clear cut in the date (everything below 0.33 is False, above or equal is True)? And is there a way to prevent this?

Thanks in advance!

1
You could also try the OneR package from CRAN: CRAN.R-project.org/package=OneR. I am the developer of that package and would be interested in the result with your data set. - vonjd

1 Answers

1
votes

The ? refers to missing values - your training data must have some values of FollowersMeanCoords_Col missing for some instances.

The model in your question says that if FollowersMeanCoords_Col for an instance (data point) is less than 0.3334..., or is missing, it will classify the instance as False, otherwise it will classify it as True.

OneR is a very simple classification algorithm which works by finding the one attribute from the training data that gives the least error when used to make a classification rule. For OneR to overfit there would need to be an attribute that happened to classify the training data well, but didn't generalise to future test data. It's more likely that OneR will give you models that are robust but inaccurate.