incorrect number of dimensions error using parLapply

Question

I am trying to parallelize some function on the 4 cores of my machine using parLapply. My function defines two embedded loops which are meant to fill out some empty columns of a predefined matrix M. However, when I run the code below I obtain the following error

2 nodes produced errors; first error: incorrect number of dimensions

Code:

require("parallel")
TheData<-list(E,T)        # list of 2 matrices of different dimensions, T is longer and wider than E

myfunction <- function(TheData) {
for (k in 1:length(TheData[[1]][,1])) {
    distance<-matrix(,nrow=length(TheData[[1]][,1]),ncol=1)
     for (j in 1:length(TheData[[2]][,1])) {
    distance[j]<-sqrt((as.numeric(TheData[[2]][j,1])-as.numeric(TheData[[1]][k,2]))^2+(as.numeric(TheData[[2]][j,2])-as.numeric(TheData[[1]][k,1]))^2)              
    }         
    index<-which(distance == min(distance))
    M[k,4:9]<-c(as.numeric(TheData[[2]][index,1]),as.numeric(TheData[[2]][index,2]),as.numeric(TheData[[2]][index,3]),as.numeric(TheData[[2]][index,4]),as.numeric(TheData[[2]][index,5]),as.numeric(TheData[[2]][index,6]))   
rm(distance)
gc() 
}  
}
n_cores <- 4
Cl = makeCluster(n_cores)
Results <- parLapplyLB(Cl, TheData, myfunction)
# I also tried: Results <- parLapply(Cl, TheData, myfunction)

Steve Weston Steve Weston · Accepted Answer · 2014-06-06T15:57:43

In your example, parLapply is iterating over a list of matrices, and passing those matrices as the argument to "myfunction". However, "myfunction" seems to expect its argument to be a list of two matrices, and so an error occurs. I can reproduce that error with:

> E <- matrix(0, 4, 4)
> E[[1]][,1]
Error in E[[1]][, 1] : incorrect number of dimensions

I'm not sure what you're really trying to do, but with the current implementation of "myfunction", I would expect you to call parLapply with a list of lists containing two matrices, such as:

TheDataList <- list(list(A,B), list(C,D), list(E,F), list(G,H))

Passing this as the second argument to parLapply would result in "myfunction" being called four times, each time with a list containing two matrices.

But your example has another problem. It looks like you expect parLapply to modify the matrix "M" as a side-effect, but it can't. I think you should change "myfunction" to return a matrix. parLapply will return the matrices in a list which you can then bind together into the desired result.

Update

From your comment, I now believe that you essentially want to parallelize "myfunction". Here's my attempt to do that:

library(parallel)
cl <- makeCluster(4)

myfunction <- function(Exy) {
  iM <- integer(nrow(Exy))
  for (k in 1:nrow(Exy)) {
    distance <- sqrt((Txy[,1] - Exy[k,2])^2 + (Txy[,2] - Exy[k,1])^2)
    iM[k] <- which.min(distance)
  }
  iM
}

# Random example data for testing
T <- matrix(rnorm(150), 10)
E <- matrix(rnorm(120), 10)

# Only export the first two columns to T to the workers
Txy <- T[,1:2]
clusterExport(cl, c('Txy'))

# Parallelize "myfunction" by calling it in parallel on block rows of "E".
ExyList <- parallel:::splitRows(E[,1:2], length(cl))
iM <- do.call('c', clusterApply(cl, ExyList, myfunction))

# Update "M" using data from "T" indexed by "iM"
M <- matrix(0, nrow(T), 9)  # more fake data
for (k in iM) {
  M[k,4:9] <- T[k, 1:6]
}
print(M)

stopCluster(cl)

Notes:

I vectorized myfunction which should make it more efficient. Hopefully it's nearly correct.
I also modified myfunction to return a vector of indices into "T" to reduce the amount of data sent back to the master.
The splitRows function from the parallel package is used to split the first two columns of "E" into a list of submatrices.
splitRows isn't exported by parallel, so I used ':::'. If this offends you, then use the splitRows function from snow which is exported.
The first two columns of "T" are exported to each of the workers since each task requires the entire first two columns.
clusterApply is used rather than parLapply since we need to iterate over submatrices of E.

incorrect number of dimensions error using parLapply

1 Answers