I am intending to extract representative samples from populations (a,b,c,d,... see below) using the "clhs" package in R. The sampling process takes very long on my (multicore) computer, so I'd like to run the sampling procedures in parallel (using multiple CPU cores simultaneously).
These are some of my (example) data frames ("populations") from which I want to draw the samples:
a <- as.data.frame(replicate(1000, rnorm(20)))
b <- as.data.frame(replicate(1000, rnorm(20)))
c <- as.data.frame(replicate(1000, rnorm(20)))
d <- as.data.frame(replicate(1000, rnorm(20)))
The clhs code I want to run is:
clh_a <- clhs(x=a, size=round(nrow(a)/5), iter=2000, simple=F)) # 20% of all samples should be selected
clh_b <- clhs(x=b, size=round(nrow(b)/5), iter=2000, simple=F))
etc...
What is the way to run this sampling process in parallel? Or is there another way of doing this in an efficient manner?
Addendum (many thanks to "zipfzapf"):
I was trying to use "parLapply" - unfortunately, at the end, R is throwing an error message saying: "Error in length(x): 'x' is missing", which I honestly don't understand... Any ideas?
My code:
library("snow")
a <- as.data.frame(replicate(1000, rnorm(20)))
b <- as.data.frame(replicate(1000, rnorm(20)))
c <- as.data.frame(replicate(1000, rnorm(20)))
d <- as.data.frame(replicate(1000, rnorm(20)))
abcd <- list(a, b, c, d)
cl <- makeCluster(4)
results <- parLapply(cl,
X = abcd,
FUN = function(i) {
clhs(x = i, size = round(nrow(i) / 5), iter = 2000, simple = FALSE)
},
)