I have to perform a cluster analysis on a big amount of data. Since I have a lot of missing values I made a correlation matrix.
corloads = cor(df1[,2:185], use = "pairwise.complete.obs")
Now I have problems how to go on. I read a lot of articles and examples, but nothing really works for me. How can I find out how many clusters are good for me?
I already tried this:
dissimilarity = 1 - corloads
distance = as.dist(dissimilarity)
plot(hclust(distance), main="Dissimilarity = 1 - Correlation", xlab="")
I got a plot, but its very messy and I dont know how to read it and how to go on. It looks like this:
Any idea how to improve it? And what can I actually get out of it?
I also wanted to create a Screeplot. I read that there will be a curve where you can see how many clusters are correct.
I also performed a cluster analysis and choose 2-20 Clusters, but the results are so long, I have no idea how to handle it and what things are important to look on.
kgs
function frommaptree
package. – patL