0
votes

I am trying to run h2o.automl() but it keeps failing because i am running out of ncpus.

I initiate my h20 session by requesting 47 threads: h2o.init(nthreads=47)

I am providing a sufficent amount of ncpus and memory at the start:

R is connected to the H2O cluster: H2O cluster uptime: 2 seconds 286 milliseconds H2O cluster timezone: Europe/London H2O data parsing timezone: UTC H2O cluster version: 3.18.0.4 H2O cluster version age: 18 days H2O cluster name: H2O_started_from_R_cmorgan1_gvi181 H2O cluster total nodes: 1 H2O cluster total memory: 26.67 GB H2O cluster total cores: 40 H2O cluster allowed cores: 40 H2O cluster healthy: TRUE H2O Connection ip: localhost H2O Connection port: 54321 H2O Connection proxy: NA H2O Internal Security: FALSE H2O API Extensions: XGBoost, Algos, AutoML, Core V3, Core V4 R Version: R version 3.4.1 (2017-06-30)

however, after a while (38% completion) it cuts out and tells me i do not have enough ncpus.

|======================================================================| 100% |==== |======= |========= |========== |==============
|================ |================= |=========== |===
|===========================
| 38%=>> PBS: job killed: ncpus 33.43 exceeded limit 32 (sum)

============================================

    Job resource usage summary

             Memory (GB)    NCPUs  Requested  :        45            48  Used       :        12 (peak)  36.00 (ave)

Has anyone come across this before and do you have a work around? I do not believe my data is abnormally sized, it has 29 scaled parameters and 94,000 rows of data.

Thanks in advace,

1
it seems there is 40 cores (H2O cluster total cores: 40 H2O) did you tried h2o.init(nthreads=32)Selcuk Akbas

1 Answers

2
votes

This has nothing to do with H2O.

The clue here is the message "PBS: job killed".

A small amount of internet searching here suggests that you are somehow using the PBS scheduler (https://en.wikipedia.org/wiki/Portable_Batch_System) and this is killing your job. (I've never actually seen anybody use PBS before, but this all seems pretty likely based on the information above.)

Since PBS is telling you your limit is 32 cores, I suggest you try specifying a value less than that. Maybe with h2o.init(nthreads=30) PBS won't kill your process anymore.