0
votes

I have a set of 9 directed graphs of differing sizes and I'd like to use graph clustering to create a dendrogram illustrating their structural similarity, similar to what's done in the NetConfer network comparison tool: https://web.rniapps.net/netconfer/. (The reasons I'm not just using this tool are that currently it can only handle 8 graphs, I'd like to be able to use other techniques for determining graph similarity than the "edge-Jaccard" method that it uses, and I'd like to be able to do this in R for reproducibility).

From what I've read, graph kernels seem to be a well-regarded way of quantifying similarity between graphs, and the package graphkernels makes many graph kernel methods available in R (https://www.rdocumentation.org/packages/graphkernels/versions/1.6). The package promises that "Using the kernel matrices computed by the package, we can easily perform tasks such as classification, regression and clustering on graph-structured samples" [1].

However, I'm not sure how to use the kernel matrix output by the methods for clustering, and I can't find any tutorials out there on it.

I've been trying the "K Step Random Walk Kernel" as recommended in [2], and from what I understand the kernel matrix output by this method should show counts of the number of walks shared by each pair of graphs in the set. So graphs with larger numbers of shared walks should have a more similar structure and so be placed closer together on a dendrogram. However, I'm not sure how to perform hierarchical clustering on such a kernel matrix object to create a dendrogram.

Here is some example code showing the method and its output:

  library(graphkernels)
  
  data(mutag)
  mutag_samp <- mutag[1:9]
  
  K <- CalculateKStepRandomWalkKernel(mutag_samp, rep(1, 2))
  K

Any help would be appreciated, and apologies if I'm just ignorant of something basic. Also, any suggestions or recommendations for other approaches to graph clustering that might better suit my directed graphs would be welcome.

[1] Sugiyama et al. (2018). graphkernels: R and Python packages for graph comparison. Bioinformatics 34, 530–532. https://doi.org/10/gb38s9

[2] Sugiyama & Borgwardt (2015). Halting in Random Walk Kernels.

1

1 Answers

0
votes

The function CalculateKStepRandomWalkKernel you tried outputs a kernel matrix K, where each K[i, j] represents a similarity between i-th graph and j-th graph.

You can use any kernel-based method using K such as kernel k-means: https://www.rdocumentation.org/packages/kernlab/versions/0.9-29/topics/kkmeans

For example, after obtaining K, run

res <- kkmeans(as.kernelMatrix(K), 2)

to obtain two clusters.

You can also try kernel-based hierarchical clustering, for example, I found the following literature: http://members.cbio.mines-paristech.fr/~jvert/svn/bibli/local/Qin2003Kernel.pdf