1
votes

First: The origin of my question is the hack for parallel processing implementation of "lapply" for Windows - by Nathan VanHoudnos

http://edustatistics.org/nathanvan/2014/07/14/implementing-mclapply-on-windows/

My objective was to apply "pdist", the distance function, to calculate "distances" between two matrices.

The original code works just fine - however there is something wrong when I try to use 'pdist' library function.

I also ensured that 'pdist' function name was included in the clusterExport() code.

The error I get is:

Error in checkForRemoteErrors(val) : 4 nodes produced errors; first error: C symbol name "Rpdist" not in load table

Reproduced the code here:

## Load packages 
require(parallel)
require(pdist)

# Define global variables 
A = rbind(c(3,40,1),c(24,13,2), c(90,8,1));
B = rbind(c(23,4,11),c(13,913,12), c(0.9,0.8,0.1));

## Step 1: Create a cluster of child processes 
cat("\n Step 1: Create a cluster of child processes...."); 
cl <- makeCluster(4)

## Step 2: Load the necessary R package(s)
## N.B. length(cl) is the number of child processes in the cluster 
cat("\n Step 2: Load the necessary R package(s)....");
par.setup <- parLapply (cl, 1:length(cl),
    function(xx) {
        require(pdist) 
})

## Step 3: Distribute the necessary R objects 
cat("\n Step 3: Distribute the necessary R objects....");
clusterExport (cl, c('A', 'B', 'pdist'))

## Step 4: Do the computation
cat("\n Step 4: Do the multi-core computation....\n");
par.Distance <- parLapply (cl, 1:4,
    function(xx) {
       as.matrix(pdist(A, B))            
    })

## Step 5: Remember to stop the cluster!
cat("\n Step 5: Stop the clusters....\n");
stopCluster(cl)

cat("\n Output: "); print(par.Distance);
cat("\n ----------------------------- \n");

Thanks for any help.

1
Are you sure you export libraries using par.setup? In the help file of ?clusterExport library call is explicit at the beginning of the function. I tried this and it returned four 3x3 matrices.Roman Luštrik
Hi Roman, Thank you! the par.setup helped me debug my problme and Steve's answer below resolved it. Thanks a lotRajeshS

1 Answers

1
votes

Your example worked for me, so I suspect that the workers are not able to successfully load the pdist package on your machine. The value of par.setup should be a list containing four TRUE values. If not, you need to resolve that problem, possibly by executing .libPaths on the workers just before loading pdist.

Also, there's no point in exporting pdist to the workers using clusterExport. It isn't necessary if you can successfully load the pdist package on the workers, and it isn't sufficient since it depends on code in the pdist package that doesn't get sent to the workers by clusterExport. All it does is to change the error message to the one you're seeing now.