9
votes

I'm running a parallel operation using a SOCK cluster with workers on the local machine. If I limit the set I'm iterating over (in one test using 70 instead of a full 135 tasks) then everything works just fine. If I go for the full set, I get the errror "Error in unserialize(socklist[[n]]) : error reading from connection".

  • I've unblocked the port in Windows Firewall (both in/out) and allow all access for Rscript/R.

  • It can't be a timeout issue because the socket timeout is set to 365 days.

  • Its not an issue with any particular task because I can run sequentially just fine (also runs fine in parallel if I split the dataset in half and do two separate parallel runs)

  • The best I can come up with is that there is too much data being transferred over the sockets. There doesn't seem to be a cluster option to throttle data limits.

I'm at a loss on how to proceed. Has anyone seen this issue before or can suggest a fix?

Here's the code I'm using to setup the cluster:

cluster = makeCluster( degreeOfParallelism , type = "SOCK" , outfile = "" )
registerDoSNOW( cluster )

Edit
While this issue is constent with the entire dataset, it also appears from time-to-time with a reduced dataset. That might suggest that this isn't simply a data limit issue.

Edit 2
I dug a little deeper and it turns out that my function in fact has a random component that makes it so that sometimes a task will raise an error. If I run the tasks serially then at the end of the operation I'm told which task failed. If I run in parallel, then I get the "unserialize" error. I tried wrapping the code that gets executed by each task in a tryCatch call with error = function(e) { stop(e) } but that also generates the "unserialize" error. I'm confused because I thought that snow handles errors by passing them back to the master?

1
R is limited to 128 simultaneous open connections... maybe that's part of it?Joshua Ulrich
I am testing with 8 connections.SFun28
But your question says everything works fine with 70 tasks, so I'm confused.Joshua Ulrich
I think you're confusing tasks with connections. I have up to 8 connections processing many more tasks. In this case I have 135 tasks that I want to run in parallel, but only 8 cores on the CPU on which to process those tasks (in practice I never go above 7 - like to leave one available for the OS)SFun28
Yes, I'm confused because the packages you're using don't use "tasks" to describe anything they do and you don't provide an example of what you mean by "tasks", so I'm trying to figure out what you mean. A minimal example that produces the behavior you describe would go a long way toward someone helping. As it stands, you require someone to replicate the behavior before they can even start investigating the cause. This may be why the author of snow ignored your email.Joshua Ulrich

1 Answers

2
votes

I have reported this issue to the author of SNOW but unfortunately there has been no reply.

Edit
I haven't seen this issue in a while. I moved to Parallel/doParallel. Also, I'm now using try() to wrap any code that gets executed in parallel. I can't repro the original issue.