1
votes

Hi I've a set of data in this form:

12347,23.75580119032886
12348,57.97548386358446
12349,24.076027347616954
12350,19.670588100657742
12352,16.267473592256245

where the first column is the ID of a user and the second one is the value of his purchases. I'm using KMeans algorithm with mahout to devide the data set into 3 clusters. My problem is that the Id column is being used so the output is wrong. Is there any way to ignore the first column and do the clustering only on the second one ? Thanks.

1

1 Answers

0
votes

Use a map-reduce job to map the data appropriately.