For simpler understanding I am explaining with smaller example.
I have 2 sets :
I have 10 unique string ids. id1,id2,id3,id4,id5... id10
I have 3 unique c-ids: cid1,cid2,cid3
There is a mapping between 2 sets but not within the values of same sets.
The mapping is say :
id1 : cid1,cid2
id2 : cid3
id3 : cid1
... so on..
I need to cluster set of ids(strings) against cids(strings) and vice a versa.
Right now I have created a csv file like below. (similar to sparse)
id1 , cid1
id1 , cid2
id3 , cid3
.
.
I run the k-means in Weka but not sure if this is the right way. All those ids are actually features / attributes which do not have any specific order. But the way I am representing , the columns are treated as attribute values. How can I convert it into features?