1
votes

I have mixed data type matrix Data_string size (947 x 41) that contain numeric and categorical attributes.

I produced a distance matrix (947 x 947) using the daisy() function and Gower distance measure in Rstudio.

d  <- daisy(Data_String, metric = "gower", stand = FALSE,type = list(symm = c("V1","V13") , asymm = c("V8","V9","V10")))

I applied hierarchical Cluster using dissimilarity matrix (d).

# hclust
hc <- hclust(d, method="complete")
plot(hc)
rect.hclust(hc, 4)
cut <- cutree(hc, k = 1:5)
View(cut)

#Diana
d_as <- as.matrix(d)
DianaCluster <- diana(d_as, diss = TRUE, keep.diss = TRUE)
print(DianaCluster)
plot(DianaCluster)

The following is the plots I had.

Diana_plot

hclust_plot

** Note: I couldn't upload the image here since I do not have enough reputation's points.

I am struggling to understand the results, can anyone please

1- suggest any solution that I can apply in R to simplify the understanding of my results.

or

2- how I can link it to my source data, since all the results are based on the dissimilarity matrix.

1
Read a book about clustering methods? Or see this link or this to get you started.ekstroem
Thank you so much, I realised that the plot will be messy for understanding , so I used " cutree " function to get list of clusters instead of a tree plot.user3895291

1 Answers

1
votes

Please take a look at - https://stats.stackexchange.com/questions/130974/how-to-use-both-binary-and-continuous-variables-together-in-clustering

It explains how to use gower dissimilarity matrix with hclust. Hope this helps!