I am a beginner in the field of data mining and want to cluster my movie data set for finding Genres group. I have 26 different genres for 86 movies in my data set. I would like to employ clustering for grouping my movies into few genres instead of 26. So for e.g. after running some clustering algorithm , I will be left with 4 clusters or any small count that best suits my data set. I have defined my data set as follows M1 { G1,G2,.....G26} M2{G1,G2,.....G26} WHERE each of the genres G1,....,G26 can hold value either 0 or 1, 0 for being absent, 1 for being present. Now my next step is to run k-means cluster on that and I want to use a good distance function for e.g. Pearson Correlation Coefficient.
I am using MATLAB for my experiments. I tried doing k-Means using k=3,4,5,6 Also I ran Hierarchial Clustering.
I am unsure how to determine which clustering results are better. How to check that? As I am a beginner, I dont know how to plot clusters for binary features in MATLAB. Also I donot have knowledge how to use Pearson Correlation Coefficient as a distance metric in k-Means. Please help.