I have generated a dendrogram using plot() function and used hclust() for hierarchical clustering. I am looking to generate a scree plot for the same. Any suggestions?
3
votes
Your question could refer to several things. What metric of cluster quality do you want to see in the plot? Perhaps within-cluster-SSE?
- G5W
I am looking for the euclidean distance as the metric of my cluster. I have to generate a Scree plot and find the kink point to determine the optimum number of clusters.
- Dhruv Mehta
3 Answers
2
votes
It's a little late, but I have an answer.
# creating a dissimilarity matrix
res.dist <- dist(USArrests, method = "euclidean")
# creating an object of class "hclust"
res.hc <- hclust(d = res.dist, method = "ward.D2")
As can be found in the documentation to hclust, it is a list of values. You can inspect them by using
View(res.hc)
Now, the variable height has exactly what is needed for a scree plot. The following code generates a scree plot:
> ggplot(res.hc$height %>%
+ as.tibble() %>%
+ add_column(groups = length(res.hc$height):1) %>%
+ rename(height=value),
+ aes(x=groups, y=height)) +
+ geom_point() +
+ geom_line()
Basically, what you do is plot the height for a number of groups. (It might not be very elegant, I'd be delighted to hear shorter versions to generate the same outcome).
My outcome is:
0
votes
0
votes
Please do take a look at the youtube link here, which will be of use
https://www.youtube.com/watch?v=aMYCFtoBrdA
Regards
code
Link to R code on google drive for download https://drive.google.com/file/d/0Byo-GmbU7XciVGRQcTk3QzdTMjA/view?usp=sharing
R code
#-----------------------------------------------
# Hierarchical clustering with the sample data
#------------------------------------------------
# Reading data into R similar to CARDS
temp_str <- "Name physics math
P 15 20
Q 20 15
R 26 21
X 44 52
Y 50 45
Z 57 38
A 80 85
B 90 88
C 98 98"
base_data <- read.table(textConnection(
temp_str), header = TRUE)
closeAllConnections()
# Check distinct categories of Variables useing STR function
str(base_data)
# Plot data
plot(base_data$physics, base_data$math,
pch=21, bg=c("red","green3","blue","red","green3","blue",
"red","green3","blue")[unclass(base_data$Name)],
main="Base Data")
# Step 01- obtain distance matrix (right way)
my_dist <- dist(base_data[c(2,3)], method = "euclidean")
print(my_dist)
# Step 02- Apply Hierarchical Clustering
fit <- hclust(my_dist, method="ward.D2")
# Step 03- Display dendogram
plot(fit, labels = base_data$Name)
Dendogram_Height=0
for (i in 2:9) Dendogram_Height[i] <- fit$height[i-1]
plot(1:9, Dendogram_Height, type="b", xlab="Sequence of merging",
ylab="Dendogram Height")
plot(9:1, Dendogram_Height, type="b", xlab="# of clusters",
ylab="Dendogram Height")
# Step 04- draw dendogram with color borders
# One can use this step to take a look at execution
rect.hclust(fit, k=8, border="red")
plot(fit, labels = base_data$Name)
rect.hclust(fit, k=7, border="red")
plot(fit, labels = base_data$Name)
rect.hclust(fit, k=6, border="red")
# draw color borders around required clusterd
plot(fit, labels = base_data$Name)
rect.hclust(fit, k=3, border="blue")
# cut tree into 3 clusters
my_groups <- cutree(fit, k=3)
