I know that Mahout is used for batch processing, but I am interested if I can use its KMeans, and how, for clustering individual points?
Let's say that we have following situation
- Global clustering, that performs batch processing on all data and gives centroids as result
- One point clustering, that uses centroids from global clustering, to assign that point to a cluster - it does not require cluster centroid re-computation - just assigning that point to an existing cluster
Can I do this using Mahout, or I have to implement it myself? I thought setting number of iterations to 1, and in that way assign the point, but the thing is, KMeans recomputes cluster centroids and if that new point is an outlier, it makes a new cluster from it. I don't want that, I actually want the distance to closest centroid.
For now, it seems that it is not very appropriate to use KMeans for this, but it should be implemented separately... Is that correct?
Thanks