1
votes

I am using identify to explore specific features of clusters in a dendrogram in R. Identify is working perfectly fine by using a 'hclust' object, but I need it for a horizontal dendrogram of class 'dendrogram' instead of 'hclust'. I have the package dendextend installed which should normally extend the functionality of identify to objects of class dendrogram and to horizontal dendrograms ( http://rpackages.ianhowson.com/cran/dendextend/man/identify.dendrogram.html). For my specific dataset, identify is working for a vertical dendrogram (of class dendrogram), but is not working for a horizontal one. The error that I always get is:

Error in rect.dendrogram(x, k = k, x = X$x, cluster = cluster[, k - 1],  : 
k must be between 2 and 10

Please find here a reproducible and simplified example:

#Install packages
install.packages(c("TraMineR","dendextend"))
#Load packages
library(TraMineR)
library(dendextend)

#Create fake dataset (each row is a sequence of characters)
a <- c(rep('A',50), rep('B',50))
seqdf <- rbind(a=a, b=sample(a), c=sample(a), d=sample(a), e=sample(a), f=sample(a),g=sample(a),h=sample(a),
i=sample(a), j=rep('A',100),k=rep('B',100),l=sample(a)) 
colnames(seqdf)<- paste(rep('a',100),c(1:100),sep='') 

#Turn it into a sequence object 
seq_def <- seqdef(seqdf, 1:100, id = rownames(seqdf), xtstep = 4)

#Calculate the dissimilarity (hamming distance) between sequences 
hd <- seqdist(seq_def, method = "HAM", with.missing = TRUE)
rows<-list(rownames(seqdf),rownames(seqdf))
dimnames(hd) <- rows
#Perform Ward clustering on dissimilarity matrix hd
ward <- hclust(as.dist(hd), method = "ward.D2")     
#Dendrogram object
dend <- as.dendrogram(ward) 

#Horizontal dendrogram 
plot(dend, horiz=TRUE)
identify(dend, horiz=TRUE) # HERE IDENTIFY GIVES AN ERROR

#Vertical dendrogram
plot(dend)
identify(dend) # this works, there is no error

Hope somebody knows how to solve this problem.

Best,

1

1 Answers

1
votes

This is a general behavior of the identify function (say, identify.hclust) when you click "too close" to the edges of the screen. You can see this if you will run (and click near the leaves):

plot(ward)
identify(ward, MAXCLUSTER = 12) 

I agree with you that this is a somewhat annoying behavior (since we don't always get to click exactly where we wanted to). So I've added to the dendextend package a new parameter (stop_if_out), which is now set to FALSE by default for identify.dendrogram. This means that the function would no longer stop when clicking too far outside the dendrogram. (it would for both vertical and horizontal plots)

It would probably take some time before I release this version to CRAN, but you can easily get access to it by using devtools and running:

install.packages.2 <- function (pkg) if (!require(pkg)) install.packages(pkg);
install.packages.2('devtools')
# make sure you have Rtools installed first! if not, then run:
#install.packages('installr'); install.Rtools()
devtools::install_github('talgalili/dendextend')

I hope this helps.