I'm an Apache Mahout newbie. I'm trying to understand which of my named vectors belong to which cluster. A lot of resources on the internet are about text documents and use the commands clusterdump. However, my dataset is really huge and running the command always causes a Java Out Of Memory Exception. Besides, I don't think that using clusterdump would answer my question.
I would like to know if it's possible to understand nothing more than which named vectors belong to which clusters using the directories clusteredPoints
and clusters-[0-9]+
and clusters-*-final
If it helps, so far, I have formed clusters of users based on their song listening habits. To do this, I initially created a sequence file using NamedVectors where the name of the NamedVector is the userId and the Vector itself is a double array containing weights of the tags of the songs listened by the user (an example is below).
AR2TSU61187FB5C4F0 0.5 0.2 0.7 0.0 0.0 0.1 0.0 0.0 ...
...
...
...
I then ran k-means successfully. I have the output in the directory clusteredPoints (some 88 files with names such as part-m-00088) and the directory clusters that I believe contain the centroids.
Thanks for any help!