Using multiple cores to run latin hypercube sampling in parallel

Question

I am intending to extract representative samples from populations (a,b,c,d,... see below) using the "clhs" package in R. The sampling process takes very long on my (multicore) computer, so I'd like to run the sampling procedures in parallel (using multiple CPU cores simultaneously).

These are some of my (example) data frames ("populations") from which I want to draw the samples:

a <- as.data.frame(replicate(1000, rnorm(20)))
b <- as.data.frame(replicate(1000, rnorm(20)))
c <- as.data.frame(replicate(1000, rnorm(20)))
d <- as.data.frame(replicate(1000, rnorm(20)))

The clhs code I want to run is:

clh_a <- clhs(x=a, size=round(nrow(a)/5), iter=2000, simple=F)) # 20% of all samples should be selected
clh_b <- clhs(x=b, size=round(nrow(b)/5), iter=2000, simple=F))

etc...

What is the way to run this sampling process in parallel? Or is there another way of doing this in an efficient manner?

Addendum (many thanks to "zipfzapf"):

I was trying to use "parLapply" - unfortunately, at the end, R is throwing an error message saying: "Error in length(x): 'x' is missing", which I honestly don't understand... Any ideas?

My code:

    library("snow")
            a <- as.data.frame(replicate(1000, rnorm(20)))
            b <- as.data.frame(replicate(1000, rnorm(20)))
            c <- as.data.frame(replicate(1000, rnorm(20)))
    d <- as.data.frame(replicate(1000, rnorm(20)))
    abcd <- list(a, b, c, d)
    cl <- makeCluster(4)
    results <- parLapply(cl,
       X = abcd,
       FUN = function(i) {
         clhs(x = i, size = round(nrow(i) / 5), iter = 2000, simple = FALSE)
       },
    )

Roman Luštrik Roman Luštrik · Accepted Answer · 2012-12-21T16:36:46

This works for me (notice I changed the number of iterations to make things move along at a reasonable pace).

library(snowfall)
sfInit(parallel = TRUE, cpus = 4, type = "SOCK")
sfLibrary(clhs)

x <- sfLapply(abcd, fun = function(x) {
            clhs(x = x, size=round(nrow(x)/5), iter = 200, simple =FALSE)
        })

     Length Class       Mode
[1,] 5      cLHS_result list
[2,] 5      cLHS_result list
[3,] 5      cLHS_result list
[4,] 5      cLHS_result list

Using multiple cores to run latin hypercube sampling in parallel

3 Answers