0
votes

I am running into an issue where I can plot a vertical dendrogram with labels but I can't add labels when it is horizontal.

My Data looks like this:

Company Industry1 Industry2 Industry3
Google     3%        5%        6%
Apple      2%        6%        1%

When i import the data, the first column contains my Labels but the rows are just 1, 2, 3 etc.

So my code reads: Data Source Is called Cluster_D

labs = Cluster_D[, 1]
Industry <- Cluster_D
rownames(Industry) <- labs$`Company`


D.Industry <- dist(scale(round(Industry[, -1], 3)), method = "euclidean")
H.Industry <- hclust(D.Industry, method = "ward.D")
plot(H.Industry, labels = Cluster_D$`Company`)

So i assign my labels to the variable 'Labs". I then place my data into another variable "Industry". Once i plot the data and pass in Labels i get the chart with the clusters I need. The chart works vertically with labels.....but

I have no idea how to get this chart flipped to horizontal and to keep the label names. I tried to use as.dendrogram function which allows me to use horiz=true but i cant keep my labels, as it reverts back to 1, 2, 3 etc.

Can anyone explain to me how I can get correct myself? I am used to use Statistica and i didn't have any issues doing hierarchical clustering, I am trying to pick up R. I feel like it should be super easy to assign labels but I just don't know how.

i tried using the below, but the charts is mislabeled (ABC order).

F.Industries <- as.dendrogram(H.Industry)
labels(F.Industries) <- paste(as.character(Cluster_D[,1]))
plot(F.Industries, horiz = TRUE) 
1
How do you scale a character vector? scale(c(3%, 2%))? If I supply numeric columns your code works for me. I get the labels in the horiz = T dendogram.missuse
scale(round(Industry[, -1], 3)) removes the the character vector in column 1 before rounding and scaling. the real data might look like .1646970438683. can i see the code you used to get it to work? i tried F.Industries <- as.dendrogram(H.Industry) and then plot(F.Industries, horiz = TRUE) but i dont get the labels just the numeric row names.PAR

1 Answers

0
votes

As requested by PAR:

data - I added one more column IBM:

z <- read.table(text = "Company Industry1 Industry2 Industry3
Google     3%        5%        6%
Apple      2%        6%        1%
IBM        7%        4%        2%", header = T)

When I try:

scale(round(z[, -1], 3))
#output
Error in Math.data.frame(list(Industry1 = c(2L, 1L, 3L), Industry2 = c(2L,  : 
  non-numeric variable in data frame: Industry1Industry2Industry3

Meaning the sample data you provided is not representative of your own.

Convert to numeric:

z = data.frame("Company" = z[,1], apply(z[,-1], 2, function(x) as.numeric(gsub("%", "", x))))

Row names are labels for the leaves

rownames(z) <- z[,1]

D.Industry <- dist(scale(z[, -1]), method = "euclidean")
H.Industry <- hclust(D.Industry, method = "ward.D")

plot(as.dendrogram(H.Industry), horiz = T)

enter image description here

one can adjust the margins with mar

par(mar=c(2, 0, 0, 8))
plot(as.dendrogram(H.Industry), horiz = T)

enter image description here

other approaches include using ape and ggdendro