2
votes

I am running a series of large simulations over a grid. I am implementing the simulations by row and I have found that my sampling functions are a bottleneck. I've attempted to use the foreach and doMC libraries to speed up the process, but I've found that either the parallel method is slower or I've been unable to code a function that would would be correctly interpreted by foreach.

Looking at some other posts, It appears that my approach using foreach may be misguided in that the number of jobs I'm attempting greatly exceeds the number of available processors. I'm wondering if folks would have some suggestions for how to best implement parallelization in my situation. My simulations generally come in two types. In the first I calculate a matrix that contains a sampling interval (rows) for each element within the grid-row that I am processing. I then sample using runif (in the real simulations my rows contain ~ 9000 cells and I'm performing 10000 simulations).

#number of simulations per element 
n = 5

#Generate an example sampling interval.
m.int1 <- matrix ( seq ( 1, 20, 1 ), ncol=10, nrow=2 )

#Define a function to sample over the interval defined in m.int1
f.rand1 <- function(a) {
return ( runif ( n, a[1], a[2] ) )
}

#run the simulation with each columns corresponding to the row element and rows 
#the simultions.
sim1 <- round( apply ( m.int1, 2, f.rand1 ) )

In the second case, I am attempting to sample from a set of empirical distributions that are indexed by column in a matrix. The value of the grid-row element corresponds to the column to be sampled.

#number of simulations per element 
n = 5

#generate a vector represeting a row of grid values 
v.int2 <- round(runif(10,1,3))

#define matrix of data that contains the distributions to be sampled.
m.samples<-cbind(rep(5,10),rep(4,10),rep(3,10))  

f.sample <- function(a) {
return ( sample ( m.samples [ ,a], n, ) )
}

#Sample m.samples indexed by column number.
sim2<- sapply(v.int2,f.sample)

In the second example, I was able to utilize foreach() and %dopar% to run in parallel, but the simulation took substantially longer than the serial code. In the case of the first example above, I could not write a proper function to take advantage of foreach parallelization. I'll put the code I used in the second case just to demonstrate my thinking - but I now realize that my approach is too costly in overhead.

library(foreach)
library(doMC)
registerDoMC(2)

n = 5

#Sample m.samples indexed by column number using parallel method.
sim2.par <- foreach ( i = 1 : length ( v.int2 ), 
    .combine="cbind") %dopar% sample ( 
     m.samples [ , v.int2 [i] ] , n )

I'd appreciate some suggestions on an approach (and some code!) that would help me utilizing parallelization efficiently. Again, the rows I'm processing generally contain about 9000 elements and we're conducting 10000 simulations per element. So my output simulation matrices are generally on the order of 10000 X 9000. Thanks for your help.

2
In cases where individual iteration is short, overhead can be very expensive, relatively speaking. That's why you're not seeing any boosts when running things on many cores. To put it differently, jobs are done so fast that it takes more time to communicate than to do the actual work.Roman Luštrik

2 Answers

1
votes

Here's a slight improvement of your first simulation. Bigger n may yield bigger gain in run time.

> n <- 1000
> m.int1 <- matrix ( seq ( 1, 20, 1 ), ncol=10, nrow=2 )
> f.rand1 <- function(a) {
+    return(runif(n, a[1], a[2]))
+ }
> system.time(x1 <- replicate(n, round(apply(m.int1, 2, f.rand1))))
   user  system elapsed 
   2.84    0.06    2.95 
> system.time(x2 <- replicate(n, matrix(round(runif(n*10, min = m.int1[1, ], max = m.int1[2, ])), ncol = 10, byrow = TRUE)))
   user  system elapsed 
   2.48    0.06    2.61 
> head(x1[,,1])
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]    1    4    5    7   10   12   13   16   17    20
[2,]    1    3    6    7   10   11   13   16   17    19
[3,]    1    3    6    7   10   12   14   16   18    20
[4,]    2    4    5    7    9   12   14   16   17    19
[5,]    1    4    5    7   10   12   14   16   17    20
[6,]    1    4    6    8    9   11   13   15   18    20
> head(x2[,,1])
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]    2    4    6    7    9   12   14   16   17    20
[2,]    1    3    6    8   10   12   14   15   18    20
[3,]    2    4    5    7    9   11   13   15   17    20
[4,]    2    3    5    7    9   11   14   15   17    19
[5,]    2    3    6    7    9   12   13   16   17    20
[6,]    2    4    6    7   10   12   14   16   17    20
1
votes

Try using this instead of the two step process. It skips the apply step:

f.rand2 <- function(a) {
  matrix( runif ( n*ncol(a), rep(a[1,], n) , rep(a[2,], n) ), nrow=ncol(a) )
                    }

f.rand2(m.int1)
           [,1]      [,2]      [,3]      [,4]      [,5]
 [1,]  1.693183  1.404336  1.067888  1.904476  1.161198
 [2,]  3.411118  3.852238  3.621822  3.969399  3.318809
 [3,]  5.966934  5.466153  5.624387  5.646181  5.347473
 [4,]  7.317181  7.106791  7.403022  7.442060  7.161711
 [5,]  9.491231  9.656023  9.518498  9.569379  9.812931
 [6,] 11.843074 11.594308 11.706276 11.744094 11.994256
 [7,] 13.375382 13.599407 13.416135 13.634053 13.539246
 [8,] 15.948597 15.532356 15.692132 15.442519 15.627716
 [9,] 17.856878 17.208313 17.804288 17.875288 17.232867
[10,] 19.214776 19.689534 19.732680 19.813718 19.866297

For me it cuts the time in half:

> system.time(x1 <- replicate(n, round(apply(m.int1, 2, f.rand1))))
   user  system elapsed 
  1.088   0.470   1.550 

> system.time(x1 <- replicate(n, f.rand2(m.int1)))
   user  system elapsed 
  0.559   0.256   0.811