I am trying to use caret
to cross-validate an elastic net model using the glmnet
implementation on an Ubuntu machine with 8 CPU cores & 32 GB of RAM. When I train sequentially, I am maxing out CPU usage on one core, but using 50% of the memory on average.
When I use
doMC(cores = xxx)
, do I need to worry about only registering xxx = floor(100/y) cores, where y is the memory usage of the model when using a single core (in %), in order to not run out of memory?Does
caret
have any heuristics that allow it to figure out the max. number of cores to use?Is there any set of heuristics that I can use to dynamically adjust the number of cores to use my computing resources optimally across different sizes of data and model complexities?
Edit:
FWIW, attempting to use 8 cores made my machine unresponsive. Clearly caret
does not check to see if the spawning xxx
processes is likely to be problematic. How can I then choose the number of cores dynamically?
library(parallel) detectCores()
This is a way to determine how many cores are available. The handling of the return value is OS dependent, but it would be interesting to know how many R thinks is available to it in your setup. May return a number smaller than 8. – Mike