Parallelized optimization in R

Question

I'm running R on linux box that has 8 multicore processors, and have an optimization problem I'd like to speed up by parallelizing the optimization routine itself. Importantly, this problem involves (1) multiple parameters, and (2) inherently slow model runs. A fairly common problem!

Anyone know of a parallelized optimizer for such occasions?

More specifically, solvers like nlm() run multiple model evaluations (two per parameter value) each time the algorithm takes a step in parameter space, so parallelizing that instance of multiple model runs would greatly speed things up in these situations when more than a few parameter values are being fit.

It seems like code that makes use of the package parallel could be written in a way that the user would have to do minimal code modification to move from using nlm() or optim() to this parallelized optimization routine. That is, it seems one could rewrite these routines basically with no changes, except that the step of calling the model multiple times, as is common in gradient-based methods, would be done in parallel.

Ideally, something like nlmPara() would take code that looks like

fit <- nlm(MyObjFunc, params0);

and require only minor modifications, e.g.,

fit <- nlmPara(MyObjFunc, params0, ncores=6);

Thoughts/suggestions?

PS: I've taken steps to speed up those model runs, but they're slow for a variety of reasons (i.e. I don't need advice on speeding up the model runs! ;-) ).

A little more reading into the different optimizers, and it looks like this sort of hack would require rewriting C code (e.g., rewriting the C port of the OPTIF9 routine to use multiple threads) or writing a native R optimizer to take advantage of the higher level parallelization option like parallel, multicore, snow, etc. — Paul J Hurtado
The optimx/optimplus package has native-R versions of a lot of optimization algorithms: maybe easiest to start from there ... ? — Ben Bolker
Thanks Ben :-) optimx allows you to input a gradient function. I'll try it out and see if I can't just hand it a parallelized block of code, which should do the trick. — Paul J Hurtado
I have a couple more thoughts -- there might be some parallel + memoization tricks? Several of the built-in optim() optimizers also take optional gr arguments — Ben Bolker
Would the rgenoud package work for you? The genoud function from that package takes a cluster argument that supports parallel computing via the snow package, although you'd have to use cluster=rep('localhost', 6) rather than cluster=6. — Steve Weston

Paul J Hurtado Paul J Hurtado · Accepted Answer · 2013-03-23T03:04:20

Here is a rough solution, that at least has some promise. Big thanks to Ben Bolker for pointing out that many/most optimization routines allow user-specified gradient functions.

A test problem with more parameter values might show more significant improvements, but on an 8 core machine the run using the parallelized gradient function takes about 70% as long as the serial version. Note the crude gradient approximation used here seems to slow convergence and thus adds some time to the process.

## Set up the cluster
require("parallel");
.nlocalcores = NULL; # Default to "Cores available - 1" if NULL.
if(is.null(.nlocalcores)) { .nlocalcores = detectCores() - 1; }
if(.nlocalcores < 1) { print("Multiple cores unavailable! See code!!"); return()}
print(paste("Using ",.nlocalcores,"cores for parallelized gradient computation."))
.cl=makeCluster(.nlocalcores);
print(.cl)


# Now define a gradient function: both in serial and in parallel
mygr <- function(.params, ...) {
  dp = cbind(rep(0,length(.params)),diag(.params * 1e-8)); # TINY finite difference
  Fout = apply(dp,2, function(x) fn(.params + x,...));     # Serial 
  return((Fout[-1]-Fout[1])/diag(dp[,-1]));                # finite difference 
}

mypgr <- function(.params, ...) { # Now use the cluster 
  dp = cbind(rep(0,length(.params)),diag(.params * 1e-8));   
  Fout = parCapply(.cl, dp, function(x) fn(.params + x,...)); # Parallel 
  return((Fout[-1]-Fout[1])/diag(dp[,-1]));                  #
}


## Lets try it out!
fr <- function(x, slow=FALSE) { ## Rosenbrock Banana function from optim() documentation.
  if(slow) { Sys.sleep(0.1); }   ## Modified to be a little slow, if needed.
  x1 <- x[1]
  x2 <- x[2]
  100 * (x2 - x1 * x1)^2 + (1 - x1)^2
}

grr <- function(x, slow=FALSE) { ## Gradient of 'fr'
  if(slow) { Sys.sleep(0.1); }   ## Modified to be a little slow, if needed.
  x1 <- x[1]
  x2 <- x[2]
  c(-400 * x1 * (x2 - x1 * x1) - 2 * (1 - x1),
    200 *      (x2 - x1 * x1))
}

## Make sure the nodes can see these functions & other objects as called by the optimizer
fn <- fr;  # A bit of a hack
clusterExport(cl, "fn");

# First, test our gradient approximation function mypgr
print( mypgr(c(-1.2,1)) - grr(c(-1.2,1)))

## Some test calls, following the examples in the optim() documentation
tic = Sys.time();
fit1 = optim(c(-1.2,1), fr, slow=FALSE);                          toc1=Sys.time()-tic
fit2 = optim(c(-1.2,1), fr, gr=grr, slow=FALSE, method="BFGS");   toc2=Sys.time()-tic-toc1
fit3 = optim(c(-1.2,1), fr, gr=mygr, slow=FALSE, method="BFGS");  toc3=Sys.time()-tic-toc1-toc2
fit4 = optim(c(-1.2,1), fr, gr=mypgr, slow=FALSE, method="BFGS"); toc4=Sys.time()-tic-toc1-toc2-toc3


## Now slow it down a bit
tic = Sys.time();
fit5 = optim(c(-1.2,1), fr, slow=TRUE);                           toc5=Sys.time()-tic
fit6 = optim(c(-1.2,1), fr, gr=grr, slow=TRUE, method="BFGS");    toc6=Sys.time()-tic-toc5
fit7 = optim(c(-1.2,1), fr, gr=mygr, slow=TRUE, method="BFGS");   toc7=Sys.time()-tic-toc5-toc6
fit8 = optim(c(-1.2,1), fr, gr=mypgr, slow=TRUE, method="BFGS");  toc8=Sys.time()-tic-toc5-toc6-toc7

print(cbind(fast=c(default=toc1,exact.gr=toc2,serial.gr=toc3,parallel.gr=toc4),
            slow=c(toc5,toc6,toc7,toc8)))

Parallelized optimization in R

4 Answers