0
votes

I am doing some experiments with clustering but now I want to visualize the data. Like in https://cwiki.apache.org/confluence/display/MAHOUT/Visualizing+Sample+Clusters , is there a way to run the classes with arguments that accept custom cluster data ? What is the best way to see cluster data?

The command i am using is: mvn -q exec:java -Dexec.mainClass=org.apache.mahout.clustering.display.DisplayClustering

Thank you

PS: I am using Mahout 0.9

1

1 Answers

1
votes

Any realistic data that is visualizable in 2 dimensions (and I don't think these classes can do much more than that) will easily fit into main memory. And if I'm not mistaken, these classes will load all the data into your memory, because they are only for demonstration.

Then you may as well use any non-Hadoop tool such as ELKI or WEKA or SciPy. Mahout really only pays off when you have more data than fits into your main memory. Otherwise, it will be a lot slower than a good single-host solution.

See e.g. this G+ post:

If your data is small enough to fit in main memory, don't run Hadoop.