I am trying to understand how simple K-means in Weka handles nominal attributes and why it is not efficient in handling such attributes.
I read that it calculates modes for such attributes. I want to know how the similarity is calculated.
Lets take an example: Consider a dataset with 3 numeric and a nomimal attribute. The nominal attribute has 3 values: A, B and C.
Instance1 has value A, Instance2 has value B and Instance3 has value A. In this case, Instance1 may be more similar to Instance3(depending on other numeric attributes of course). How will Simple K-means work in this case?
Follow up: What if the nominal attribute has more(10) possible values?