2
votes

the problem: in R, I need to plot a dendrogram + cut the associated tree from a linkage matrix created in a different language. based on the nature of the dataset, the prior processing is only available with this other language. so I need to be able to work in R from an already determined linkage matrix.

I have a linkage matrix and a correlation matrix created from a different language. I saved both as csv files and can read either as a data frame into R.

my approach I wanted to convert the linkage matrix to an hclust object in R, so that I could pass to as.dendrogram and then subsequently use cutree.

When I run as.hclust(df), I get the error:

Error in as.hclust.default(df) : argument 'x' cannot be coerced to class “hclust” Consider providing an as.hclust.data.frame() method

as.hclust only takes a dist, Diana, or Agnes object I'm unsuccessfully able to convert the data frame to any of these objects to proceed with my downstream analysis.

an alternative would be to work with the correlation matrix, but I'm not seeing a way to backtrack the physical distances from which to build a meaningful dendrogram.

I could use scipy.cluster.hierarchy.cut_tree in Python but there are documented issues with the function that remain unresolved, so I wanted to use R.

many thanks

1

1 Answers

0
votes

I'm not sure what would you call the "linkage matrix" or whether there's a "standard" format for them across packages, but in these cases in helps to use str:

x <- matrix(rnorm(30), ncol = 3)
hc <- hclust(dist(x), method = "complete")
str(hc)
List of 7
    $ merge      : int [1:9, 1:2] -5 -6 -8 -4 -2 -3 -1 6 5 -7 ...
    $ height     : num [1:9] 0.714 0.976 1.381 1.468 2.065 ...
    $ order      : int [1:10] 2 6 10 3 8 5 7 1 4 9
    $ labels     : NULL
    $ method     : chr "complete"
    $ call       : language hclust(d = dist(x), method = "complete")
    $ dist.method: chr "euclidean"
    - attr(*, "class")= chr "hclust"

So, from this, one can deduce that it's a simple S3 structure, and it should be possible to create an imitation with your already-determined step-by-step data like this:

my_hc <- list(
    merge = <your data>,
    height = <your data>,
    order = <your data>,
    labels = NULL,
    method = "complete",
    call = "some_optional_string",
    dist.method = "your_custom_distance"
)
class(my_hc) <- "hclust"

Otherwise, you could let R re-do the clustering from a distance matrix if that's available or computationally feasible.