I'm following this tutorial Introduction to Machine Learning with R and caret (https://www.youtube.com/watch?v=z8PRU46I3NY) and get different machine behaviour when running R in parallel with doSNOW on macOS compared to centOS:
cl = makeCluster(4, type = 'SOCK')
registerDoSNOW(cl)
# build model
caret.cv = train(Survived ~ .,
data = titanic.train,
method = 'xgbTree',
tuneGrid = tune.grid,
trControl = train.control)
stopCluster(cl)
When running on macOS it creates 4 processes with each 1 thread so running 4@>99% (xgbTree in ~6min). On centOS it creates 4 processes each running 24 threads so in total 24@>99% (xgbTree not finishing >>30min). Even when only creating one or two clusters on centOS all threads are used and the server is completely busy.
UPDATE: When running non-caret code using doSNOW clusters everything works fine - running 1 thread per process, even on centOS.
Is there anything I'm missing? Should I expect different behaviour on these systems with identical scripts? Do I need to specify something for use on centOS?
I'm very new to caret & parallel R and so far I've read that there are only bigger differences between mac/linux and windows.
Please let me know if I can get you additional info. Thanks for your help and suggestions.
htop on centOS 60x+: R --slave --no-restore ==file=/usr/lib64/R/library/snow/RSOCKnode.R --args MASTER=localhost PORT=11326 OUT=/dev/null SNOWLIB=/usr/lib64/R/library
R Version 3.3.2: x86_64-redhat-linux-gnu ; x86_64-apple-darwin13.4.0 / centOS server: 2 sockets each 6 cores, each 2 threads / macOS MBP: 1/8/1