It might be faster to call average.path.length(G)
. Docs here: http://cneurocvs.rmki.kfki.hu/igraph/doc/R/shortest.paths.html
EDIT: Sorry about that confusion. I think there are two general strategies you can use here:
- Parallelize this.
- Calculate shortest paths against a sampling of nodes instead of against the entire graph and estimate the average shortest path from as the mean of these paths.
I experimented with this a bit just on my laptop and observed a significant improvement in performance from parallelization and a smaller improvement from subsampling (70% followed by an additional 10%, to a total of around 72% improvement on a 5000 node barabasi graph using my laptop). Additionally, the average % sampling error falls quickly and there isn't much change between 5% and 10% sampling (this may be a consequence of my choice of barabasi graph for this experiment). If you have access to an HPC cluster, either or both of these strategies should be well suited to being mapped out to separate workers.
Here's your code modified to use parallelization and downsampling:
library(igraph)
library(foreach)
library(doParallel)
stopCluster(cl)
cl <- makeCluster(8)
registerDoParallel(cl)
sampled_est_par <- function(G, perc_subsample=.1, max_samples=1e3){
N <- gorder(G)
m <- floor(perc_subsample*N)
m <- ifelse(m>max_samples, max_samples, m)
foreach(v=1:N) %dopar% {
q <- igraph::shortest.paths(G, v, sample(N, m))
rowSums(q*is.finite(q),na.rm = TRUE)/m
}
}
And here's my experimentation code: https://gist.github.com/dmarx/80b4d093bdcab2fff97ee0da2968084f