2
votes

I am running the R package h2o version 3.20.0.2 on an azure cluster.

After fitting many h2o models, the h2o cluster seems to have become unresponsive with this error message:

Warning in .h2o.__checkConnectionHealth() : H2O cluster node 127.0.0.1:54321 is behaving slowly and should be inspected manually.

I have tried to reset the cluster with h2o.shutdown() but the problem persists and h2o.init() fails. Without admin rights, how can I truly restart the h2o server and how would I avoid this problem in the future ?

1

1 Answers

1
votes

The most common reason for this is you have used all the memory in the cluster.

Options include doing things like:

  • asking for a larger cluster size when you start it
  • calling h2o.rm or h2o.removeAll to remove in-memory objects to free up space

h2o.shutdown() uses an api call to the backend to do a cooperative shutdown, but if the backend is already in a bad state it may not work.

If you are running R on the same host as the H2O server, you can do things like system(“ps -ef”) in R to run linux shell commands and try to fix it up that way, even without a direct terminal prompt. Find the h2o java process and kill it.