0
votes

I need to analyse a group of clients, say I've got 2783 clients. I've got a code R written for a generic client, and I've got all the data that the program needs to calculate different variables in a database linked to the workspace. The code must be run sequentially since there are many dependent variables that build on each other. Each run of a client takes about 1 minute to run. I know I've got 8 logical processors in my computer and R only uses 1 unless run in parallel.

The issue I haven't found an answer in the internet yet, is that I need to send via a batch file: client 1 to the first processor.... client 8 to the eighth processor... and only when one processor is done, write a log file with some specifics about the run itself and move on to the next client, say processor 1 when finishes with client 1, move on to client 9 (since the other 7 processors have started with the remaining first 7 clients on the batch list).

A given processor must start and finish a client that has picked up because of what I mentioned. And each week I have a similar amount of clients to analyse.

So it would be a problem of batch processing R code and in parallel to maximise the computer's processing power.

At this rate, to run all the client extract of about 2800 people I'd need almost 2 days working around the clock! Just using the 8 cores would reduce this amount of time by around 88% to approximately 6 hours, and in batch processing, even if it takes 6 hours, they would be 6 hours in which I can focus on doing other work.

Thanks in advance!

1
You might want to take a look at future.apply[r-bloggers.com/… in you're using the apply family of operators, or furrr[github.com/DavisVaughan/furrr] if you use the tidyverse. - Dom
The first link redirects to R bloggers, but not to a specific entry to an issue like this. The second link is broken. - juan diluca haltrich

1 Answers

0
votes

As mentionned in the comments, furrr is a practical choice building on Tidyverse's purrr, with the multiprocessiong capabilities of future:

library(furrr)
library(dplyr)
plan(multisession, workers = 3) 
nbrOfWorkers() 
#[1] 3

clients <- as.list(1:9)
system.time(
results <- clients %>% future_map(~{
  client <- .x
  cat("processing client",client,"\n")
  # Long processing
  # source('ScripttoProcess.R')
  Sys.sleep(5)
  # Save results
  paste("Client ",client," results")
},.progress=T)
)

Progress: ──────────────────────────────────────────────────────────────── 100%

processing client 1 
processing client 2 
processing client 3 
processing client 4 
processing client 5 
processing client 6 
processing client 7 
processing client 8 
processing client 9 

       User      System       Total 
       0.09        0.00       15.47