First: The origin of my question is the hack for parallel processing implementation of "lapply" for Windows - by Nathan VanHoudnos
http://edustatistics.org/nathanvan/2014/07/14/implementing-mclapply-on-windows/
My objective was to apply "pdist", the distance function, to calculate "distances" between two matrices.
The original code works just fine - however there is something wrong when I try to use 'pdist' library function.
I also ensured that 'pdist' function name was included in the clusterExport() code.
The error I get is:
Error in checkForRemoteErrors(val) : 4 nodes produced errors; first error: C symbol name "Rpdist" not in load table
Reproduced the code here:
## Load packages
require(parallel)
require(pdist)
# Define global variables
A = rbind(c(3,40,1),c(24,13,2), c(90,8,1));
B = rbind(c(23,4,11),c(13,913,12), c(0.9,0.8,0.1));
## Step 1: Create a cluster of child processes
cat("\n Step 1: Create a cluster of child processes....");
cl <- makeCluster(4)
## Step 2: Load the necessary R package(s)
## N.B. length(cl) is the number of child processes in the cluster
cat("\n Step 2: Load the necessary R package(s)....");
par.setup <- parLapply (cl, 1:length(cl),
function(xx) {
require(pdist)
})
## Step 3: Distribute the necessary R objects
cat("\n Step 3: Distribute the necessary R objects....");
clusterExport (cl, c('A', 'B', 'pdist'))
## Step 4: Do the computation
cat("\n Step 4: Do the multi-core computation....\n");
par.Distance <- parLapply (cl, 1:4,
function(xx) {
as.matrix(pdist(A, B))
})
## Step 5: Remember to stop the cluster!
cat("\n Step 5: Stop the clusters....\n");
stopCluster(cl)
cat("\n Output: "); print(par.Distance);
cat("\n ----------------------------- \n");
Thanks for any help.
par.setup
? In the help file of?clusterExport
library call is explicit at the beginning of the function. I tried this and it returned four 3x3 matrices. – Roman Luštrik