0
votes

When I run kmeans clustering in Mahoot I get two folders, clusters-x and clusteredPoints.

I have read cluster centers using cluster dumper, but I somehow can't get to clusteredPoints? Concretely, I need to do it from code.

The strange thing is that I file size in clusteredPoints is always 128 bytes, and when I try to loop through results, using next code, it just goes out of the loop, like there is no result, but I get the cluster centers, which leads to assumption that points are clustered.

    IntWritable key = new IntWritable();
    WeightedPropertyVectorWritable value = new WeightedPropertyVectorWritable();
    while (reader.next(key, value)) {
        System.out.println(
        value.toString() + " belongs to cluster " + key.toString());
    }

It just goes out of the loop?

It is really strange, any help would be great, thanks.

1

1 Answers

1
votes

You need to open up your final cluster file ('clusteredPoints/part-m-0') with:

Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
SequenceFile.Reader reader = new SequenceFile.Reader(fs, new Path("output/clusteredPoints/part-m-0"), conf);

then, assuming your keys are int's, iterate through it (as you already did), with:

IntWritable key = new IntWritable();
    WeightedPropertyVectorWritable value = new WeightedPropertyVectorWritable();
    while (reader.next(key, value)) {
        LOG.info("{} belongs to cluster {}", value.toString(), key.toString());
    }
    reader.close();

I can post a fully working example if you still have trouble doing this.