I am trying to run Kmeans clustering on below set of data,
Name,Gender,Age,Drinks,Country
John,M,30,Pepsi,US
Jack,M,25,Coke,US
David,M,34,Pepsi,UK
Ted,M,37,Limca,CAN
Robert,M,23,Limca,US
Adrian,M,31,Pepsi,US
Craig,M,37,Coke,UK
Katie,F,23,Limca,UK
Nancy,F,32,Pepsi,UK
I want to cluster the data based on Drinks(pepsi,coke,Limca)and i am able to do it.But i want to retrive name also alongside clustered data.
the output i am getting is
0
1
2
Limca belongs to cluster:0
Cokde belongs to cluster:0
etc.
here i want to get the names also.
while converting to sequence file i am taking key as drinks and value as the rest of text and converting to sparsevector and then running Kmeans clustering,the names are not printed. can anybody point how i extract name from the clusters which are there in values.