2
votes

I am fairly new to clustering and am aware that there are various APIs out there that provide clustering algorithms as well as evaluations.

My objective is to do is to cluster documents (contents of the file), and then generating a topic for the generated clusters.

I currently have implemented Apache Mahout for the clusters using LDA, and Mallet for the topic generation of the cluster.

What I am required to do now is that I have to implement several other clustering algorithms, and then compare them against LDA, to evaluate the performance of each algorithm, in an aim to justify that LDA is a suitable algorithm for my work.

I have googled and understood that evaluating clustering algorithms involve Internal or External evaluations with different criterion as required. However, different criterion/evaluation metrics are to be used for different algorithms.

In my case, since I am using different algorithms for clustering, is there any suitable framework out there that I can utilize so that it will help me evaluate the performances of my cluster results? Or is there any alternative to this solution?

I have to work on this using JAVA language