9
votes

Question

I've noticed that foreach/%dopar% performs sequential, not parallel setup of a cluster prior to executing tasks in parallel. If each worker requires a dataset and it takes N seconds to transfer the dataset to the worker, then foreach/%dopar% spends #workers * N seconds of setup time. This can be significant for large # of workers or a large N (large datasets to transfer).

My question is whether this is by design or is there some parameter/setting that I'm missing in foreach or perhaps in cluster generation?

Setup

  • R 2.15.2
  • latest versions of foreach/parallel/doParallel as of today (1/7/2013)
  • Windows 7 x64

Example

library( foreach )
library( parallel )
library( doParallel )

# lots of data
data = eval( rnorm( 100000000 ) )

# make cluster/register - creates 6 nodes fairly quickly
cluster = makePSOCKcluster( 6 , outfile = "" )
registerDoParallel( cluster  )

# fire up Task Manager.  Observer that each node recieves data sequentially.
# When last node gets data, then all nodes process at the same time
results = foreach( i = 1 : 500 )  %dopar%
{
    print( data[ i ] )
    return( data[ i ] )
}
1
The "sequential setup" is the only kind of behavior I've experienced without using a shared-memory cluster. If there's a way to speed this up without shared memory, I'd be very interested, too. However, as clusterExport() (via clusterCall()) executes sequentially, I don't think I'll hold my breath until then.BenBarnes
Ben - Could you elaborate? And pardon my ignorance on things related to OS/memory....In the example there are 6 child processes and I would think an opportunity to send data to them in parallel. Is it that the parent process can only access that internal data sequentially?SFun28
This is bordering on the limits of my knowledge of cluster communication, but on unix-alike systems, one can fork a process, allowing child processes to access objects loaded in the parent process, only copying those that are modified. Windows machines don't have this particular capability, and with all of the cluster types I've used (which is not all), cluster setup has happened sequentially.BenBarnes
Ben - thanks! Hopefully someone chimes-in about the possibility of doing this in parallel. Or perhaps your observation that this has always been sequential is because there is no other way to do it.SFun28

1 Answers

4
votes

Thanks to Rich at Revolution Computing for helping with this one....

clusterCall uses a for loop to send data to each worker. Because R is not multi-threaded the for loop must be sequential.

There are a few solutions (which would require someone to code them up). R could call out to C/C++ to thread the worker setup. Or the workers could pull the data from a file on disk. Or the workers could listen on the same socket and the master could write to the socket just once and have the data broadcast to all workers.