1
votes

I have a large graph with 60k nodes. The graph is in the form of a distance matrix 60kX60k.

All of these distances are between 0 and 1 -- they are just 1 - cosine similarity.

I want to plot this graph using igraph and use auto thresholding so that large distance nodes do not form edges between them.

What is the best way to do this ?

I did a manual thresholding of 0.1 and removed all edges with distance greater than 0.1. Right now my computer hangs when I try to do this plot. Ideally I want to just specify the number of edges and want igraph to create the plot with the closest nodes within that number of edges..

Thanks a lot..

1
You need to include some code. You can use a random graph that is similar to yours. Also, plotting a graph with that many vertices is usually not helpful, unless you simplify the graph by clustering the vertices or some other way.Gabor Csardi

1 Answers

0
votes

First, I am a huge fan of the igraph package. It has been a lifesaver for me.

That being said, I had a similar problem with plotting large graphs in rigraph. My solution was to plot the graph using plot.default instead of plot.igraph. However, this requires some additional work. Also, if the graph large and the edge-list long, I would try to use data.table when transforming the igraph object into a rectangular array that can be plotted.

Below is a replicable sample code:

library("tictoc")
library("magrittr")
library("igraph")
library("data.table")

set.seed(42)

# no of nodes
n = 2000
# layout
layout = matrix(rnorm(n * 2L), nrow = n)

# sample graph
g = sample_gnp(n, .05)

# plot.igraph
igraph_plot = function() {
    pdf("plot_igraph.pdf")
    plot(
        g, 
        vertex.label = NA, 
        vertex.size = 1.5, 
        vertex.color = "black",
        edge.color = scales::alpha("grey", .3), 
        layout = layout
    )
    dev.off()
}

# plot.default
base_plot = function() {
    
    pdf("plot_base.pdf")
    
    # get edge-list
    el = as.data.table(as_edgelist(g)) %>%
        setnames(c("x","y"))
    
    # add ids to layout
    d_layout = data.table(
        x_coord = layout[, 1L],
        y_coord = layout[, 2L],
        id      = 1:n
    )
    
    # add coordinates to edgelist endpoints
    el_w_layout = merge(
        el, 
        d_layout, 
        by.x = "x",
        by.y = "id",
        all.x = TRUE
    ) %>%
        setnames(
            c("x_coord", "y_coord"), c("x1", "y1")
        ) %>%
        merge(
            d_layout,
            by.x = "y",
            by.y = "id",
            all.x = TRUE
        ) %>%
        setnames(
            c("x_coord", "y_coord"), c("x2", "y2")
        )
    
    # plot frame plot.default
    plot(
        d_layout$x_coord, 
        d_layout$y_coord, 
        axes = F, 
        type = "n", 
        xlab = NA, 
        ylab = NA
    )
    
    # add edges
    segments(
        x0  = el_w_layout$x1,
        x   = el_w_layout$x2,
        y0  = el_w_layout$y1,
        y   = el_w_layout$y2,
        col = scales::alpha("grey", .3)
    )
    # add vertices
    points(d_layout$x_coord, d_layout$y_coord, pch = 19, cex = .5)
    dev.off()
}

Running this code gives me:

tic()
igraph_plot()
toc()
2.002 sec elapsed

and

tic()
base_plot()
toc()
0.277 sec elapsed