2
votes

If one of my columns in the data set has just 3 possible values .i.e. 0, 1 and 2, how differently does WEKA treat them if I declare them as nominal v/s numerical?

Also, if I have a large number of nominal values for an attribute for a column, is there an easy way to declare this nominal attribute which has a very high ordinal value?

2

2 Answers

7
votes

Roughly speaking (it depends on the actual algorithm):

When treated as numeric, the difference of 1 to 2 and 1 to 3 will roughly be twice as big. (Given that there are no other attributes).

When treated as strings, they are both probably equally different, as '1' != '2' and '1' != '3'. (However, the result may e.g. depend on the frequency of the numbers, for example; common dissimilarity measures for categorical data involve relative frequencies)

1
votes

How numeric and categorical values are treated depends on the actual machine learning algorithm within Weka that you're using. Some aren't able to handle both classes of attributes and if you select such an algorithm with the wrong attribute type, Weka will tell you.

In general you should declare the attributes as what they really are, i.e. if a value is numeric, declare it as numberic even if there're only a few different actual values. Likewise, if the attribute is categorical, declare it as such even if there're many different values.

Regarding your last question, I don't think Weka distinguishes between categorical values with few and many different actual values. It should be the same as for everything else.