1
votes

I'm trying to set up a H2O cloud on a 4 data nodes hadoop spark cluster using R in a Zeppelin notebook. I found that I have to give each executor at least 20Gb of memory before my R paragraph stops complaining about running out of memory (java error message of GC out of memory).

Is it expected that I need 20Gb of memory per executor for running an H2O cloud? Or are there any configuration entries that I can change to reduce the memory requirement?

1
The memory requirements depend on the size of your data, how big is your data? - Erin LeDell
We used the stock h2o dataset from github (load.csv) which is only 16.6 MB - Tim To

1 Answers

0
votes

There isn't enough information in this post to give specifics. But I will say that the presence of Java GC messages is not necessarily a problem, especially at startup. It's normal to see a flurry of GC messages at the beginning of a Java program's life as the heap expands from nothing to it's steady-state working size.

A sign that Java GC really is becoming a major problem is when you see back-to-back full GC cycles that have a real wall-clock time of seconds or more.