I'm running a parallel operation using a SOCK cluster with workers on the local machine. If I limit the set I'm iterating over (in one test using 70 instead of a full 135 tasks) then everything works just fine. If I go for the full set, I get the errror "Error in unserialize(socklist[[n]]) : error reading from connection".
I've unblocked the port in Windows Firewall (both in/out) and allow all access for Rscript/R.
It can't be a timeout issue because the socket timeout is set to 365 days.
Its not an issue with any particular task because I can run sequentially just fine (also runs fine in parallel if I split the dataset in half and do two separate parallel runs)
The best I can come up with is that there is too much data being transferred over the sockets. There doesn't seem to be a cluster option to throttle data limits.
I'm at a loss on how to proceed. Has anyone seen this issue before or can suggest a fix?
Here's the code I'm using to setup the cluster:
cluster = makeCluster( degreeOfParallelism , type = "SOCK" , outfile = "" )
registerDoSNOW( cluster )
Edit
While this issue is constent with the entire dataset, it also appears from time-to-time with a reduced dataset. That might suggest that this isn't simply a data limit issue.
Edit 2
I dug a little deeper and it turns out that my function in fact has a random component that makes it so that sometimes a task will raise an error. If I run the tasks serially then at the end of the operation I'm told which task failed. If I run in parallel, then I get the "unserialize" error. I tried wrapping the code that gets executed by each task in a tryCatch call with error = function(e) { stop(e) } but that also generates the "unserialize" error. I'm confused because I thought that snow handles errors by passing them back to the master?