3
votes

I have generated a dendrogram using plot() function and used hclust() for hierarchical clustering. I am looking to generate a scree plot for the same. Any suggestions?

3
Your question could refer to several things. What metric of cluster quality do you want to see in the plot? Perhaps within-cluster-SSE? - G5W
I am looking for the euclidean distance as the metric of my cluster. I have to generate a Scree plot and find the kink point to determine the optimum number of clusters. - Dhruv Mehta

3 Answers

2
votes

It's a little late, but I have an answer.

# creating a dissimilarity matrix
res.dist <- dist(USArrests, method = "euclidean")

# creating an object of class "hclust"
res.hc <- hclust(d = res.dist, method = "ward.D2")

As can be found in the documentation to hclust, it is a list of values. You can inspect them by using

View(res.hc)

Now, the variable height has exactly what is needed for a scree plot. The following code generates a scree plot:

> ggplot(res.hc$height %>%
+            as.tibble() %>%
+            add_column(groups = length(res.hc$height):1) %>%
+            rename(height=value),
+        aes(x=groups, y=height)) +
+     geom_point() +
+     geom_line()

Basically, what you do is plot the height for a number of groups. (It might not be very elegant, I'd be delighted to hear shorter versions to generate the same outcome).

My outcome is:

enter image description here

0
votes
library(nFactors)
ev <- eigen(cor(mydata)) # get eigenvalues
ap <- parallel(subject=nrow(mydata),var=ncol(mydata),
rep=100,cent=.05)
nS <- nScree(x=ev$values, aparallel=ap$eigen$qevpea)
plotnScree(nS)
0
votes

Please do take a look at the youtube link here, which will be of use

https://www.youtube.com/watch?v=aMYCFtoBrdA

Regards

code

Link to R code on google drive for download https://drive.google.com/file/d/0Byo-GmbU7XciVGRQcTk3QzdTMjA/view?usp=sharing


R code

#-----------------------------------------------
# Hierarchical clustering with the sample data
#------------------------------------------------


# Reading data into R similar to CARDS

temp_str <- "Name physics math
P 15 20
Q 20 15
R 26 21
X 44 52
Y 50 45
Z 57 38
A 80 85
B 90 88
C 98 98"

base_data <- read.table(textConnection(
  temp_str), header = TRUE)
closeAllConnections()

# Check distinct categories of Variables useing STR function
str(base_data)

# Plot data 
plot(base_data$physics, base_data$math, 
     pch=21, bg=c("red","green3","blue","red","green3","blue",
                  "red","green3","blue")[unclass(base_data$Name)],
     main="Base Data")




# Step 01- obtain distance matrix (right way)
my_dist <- dist(base_data[c(2,3)], method = "euclidean")
print(my_dist)

# Step 02- Apply Hierarchical Clustering
fit <- hclust(my_dist, method="ward.D2")

# Step 03- Display dendogram
plot(fit, labels = base_data$Name)


Dendogram_Height=0
for (i in 2:9) Dendogram_Height[i] <- fit$height[i-1]
plot(1:9, Dendogram_Height, type="b", xlab="Sequence of merging",
     ylab="Dendogram Height")
plot(9:1, Dendogram_Height, type="b", xlab="# of clusters",
     ylab="Dendogram Height")




# Step 04- draw dendogram with color borders 
# One can use this step to take a look at execution
rect.hclust(fit, k=8, border="red")
plot(fit, labels = base_data$Name)
rect.hclust(fit, k=7, border="red")
plot(fit, labels = base_data$Name)
rect.hclust(fit, k=6, border="red")

# draw color borders around required clusterd
plot(fit, labels = base_data$Name)
rect.hclust(fit, k=3, border="blue")

# cut tree into 3 clusters
my_groups <- cutree(fit, k=3)