Why not load balance when parallel computing using snowfall?

Question

For a long time I have been using sfLapply for a lot of my parallel r scripts. However, recently as I have delved more into parallel computing, I have been using sfClusterApplyLB, which can save a lot of time if individual instances do not take the same amount of time to run. Were as sfLapply will wait for each instance of a batch to finish before loading a new batch (which may lead to idle instances), with sfClusterApplyLB instances that complete their task will immediately be assigned to remaining elements in the list thus potentially saving quite a bit of time when instances do not take precisely the same amount of time. This has led me to question why would we ever want to NOT load balance our runs when using snowfall? The only thing I have found so far is that, when there is an error in the paralleled script, sfClusterApplyLB will still cycle through the entire list before giving an error, while sfLapply will stop after trying the first batch. What else am I missing? are there any other costs/ downsides of load balancing? Below is an example code that shows the difference between the two

rm(list = ls()) #remove all past worksheet variables
working_dir="D:/temp/"
setwd(working_dir)
n_spp=16
spp_nmS=paste0("sp_",c(1:n_spp))
spp_nm=spp_nmS[1]
sp_parallel_run=function(sp_nm){
  sink(file(paste0(working_dir,sp_nm,"_log.txt"), open="wt"))#######NEW
  cat('\n', 'Started on ', date(), '\n') 
  ptm0 <- proc.time()
  jnk=round(runif(1)*8000000) #this is just a redundant script that takes an arbitrary amount of time to run
  jnk1=runif(jnk)
  for (i in 1:length(jnk1)){
    jnk1[i]=jnk[i]*runif(1)
  }
  ptm1=proc.time() - ptm0
  jnk=as.numeric(ptm1[3])
  cat('\n','It took ', jnk, "seconds to model", sp_nm)

  #stop sinks
  sink.reset <- function(){
    for(i in seq_len(sink.number())){
      sink(NULL)
    }
  }
  sink.reset()
}
require(snowfall)
cpucores=as.integer(Sys.getenv('NUMBER_OF_PROCESSORS'))

sfInit( parallel=T, cpus=cpucores) # 
sfExportAll() 
system.time((sfLapply(spp_nmS,fun=sp_parallel_run)))
sfRemoveAll()
sfStop()

sfInit( parallel=T, cpus=cpucores) # 
sfExportAll() 
system.time(sfClusterApplyLB(spp_nmS,fun=sp_parallel_run)) 
sfRemoveAll()
sfStop()

Steve Weston Steve Weston · Accepted Answer · 2014-02-03T21:20:05

The sfLapply function is useful because it splits up the input values into one group of tasks for each available worker, which is what the mclapply function calls prescheduling. This can give much better performance than sfClusterApplyLB when the tasks don't take long.

Here's an extreme example that demonstrates the advantages of prescheduling:

> system.time(sfLapply(1:100000, sqrt))
   user  system elapsed
  0.148   0.004   0.170
> system.time(sfClusterApplyLB(1:100000, sqrt))
   user  system elapsed
 19.317   1.852  21.222

Why not load balance when parallel computing using snowfall?

1 Answers