For a long time I have been using sfLapply for a lot of my parallel r scripts. However, recently as I have delved more into parallel computing, I have been using sfClusterApplyLB, which can save a lot of time if individual instances do not take the same amount of time to run. Were as sfLapply will wait for each instance of a batch to finish before loading a new batch (which may lead to idle instances), with sfClusterApplyLB instances that complete their task will immediately be assigned to remaining elements in the list thus potentially saving quite a bit of time when instances do not take precisely the same amount of time. This has led me to question why would we ever want to NOT load balance our runs when using snowfall? The only thing I have found so far is that, when there is an error in the paralleled script, sfClusterApplyLB will still cycle through the entire list before giving an error, while sfLapply will stop after trying the first batch. What else am I missing? are there any other costs/ downsides of load balancing? Below is an example code that shows the difference between the two
rm(list = ls()) #remove all past worksheet variables
working_dir="D:/temp/"
setwd(working_dir)
n_spp=16
spp_nmS=paste0("sp_",c(1:n_spp))
spp_nm=spp_nmS[1]
sp_parallel_run=function(sp_nm){
sink(file(paste0(working_dir,sp_nm,"_log.txt"), open="wt"))#######NEW
cat('\n', 'Started on ', date(), '\n')
ptm0 <- proc.time()
jnk=round(runif(1)*8000000) #this is just a redundant script that takes an arbitrary amount of time to run
jnk1=runif(jnk)
for (i in 1:length(jnk1)){
jnk1[i]=jnk[i]*runif(1)
}
ptm1=proc.time() - ptm0
jnk=as.numeric(ptm1[3])
cat('\n','It took ', jnk, "seconds to model", sp_nm)
#stop sinks
sink.reset <- function(){
for(i in seq_len(sink.number())){
sink(NULL)
}
}
sink.reset()
}
require(snowfall)
cpucores=as.integer(Sys.getenv('NUMBER_OF_PROCESSORS'))
sfInit( parallel=T, cpus=cpucores) #
sfExportAll()
system.time((sfLapply(spp_nmS,fun=sp_parallel_run)))
sfRemoveAll()
sfStop()
sfInit( parallel=T, cpus=cpucores) #
sfExportAll()
system.time(sfClusterApplyLB(spp_nmS,fun=sp_parallel_run))
sfRemoveAll()
sfStop()