I am trying to train a machine learning model with H2O (3.14). My dataset size is 4Gb and my computer RAM is 2Gb with 2G swap, JDK 1.8. Refer to this article, H2O can process a huge dataset with 2Gb RAM.
- A note on Bigger Data and GC: We do a user-mode swap-to-disk when the Java heap gets too full, i.e., you're using more Big Data than physical DRAM. We won't die with a GC death-spiral, but we will degrade to out-of-core speeds. We'll go as fast as the disk will allow. I've personally tested loading a 12Gb dataset into a 2Gb (32bit) JVM; it took about 5 minutes to load the data, and another 5 minutes to run a Logistic Regression.
Some questions around this issues:
- Loading data bigger than the memory size in h2o . The answer mentioned user-mode swap-to-disk was disabled since performance was so bad. However, he did not explain any alternative method and how can enable the flags
--cleaner
in h2o?
Work around 1:
I configured the java heap with options java -Xmx10g -jar h2o.jar
. When I load dataset. The H2O information as follows:
However, JVM consumed all RAM memory and Swap, then operating system halted java h2o program.
Work around 2:
I installed H2O spark. I can load dataset but spark was hanging with the following logs with a full swap memory:
+ FREE:426.8 MB == MEM_MAX:2.67 GB), desiredKV=841.3 MB OOM!
09-01 02:01:12.377 192.168.233.133:54321 6965 Thread-47 WARN: Swapping! OOM, (K/V:1.75 GB + POJO:513.2 MB + FREE:426.8 MB == MEM_MAX:2.67 GB), desiredKV=841.3 MB OOM!
09-01 02:01:12.377 192.168.233.133:54321 6965 Thread-48 WARN: Swapping! OOM, (K/V:1.75 GB + POJO:513.2 MB + FREE:426.8 MB == MEM_MAX:2.67 GB), desiredKV=841.3 MB OOM!
09-01 02:01:12.381 192.168.233.133:54321 6965 Thread-45 WARN: Swapping! OOM, (K/V:1.75 GB + POJO:513.3 MB + FREE:426.7 MB == MEM_MAX:2.67 GB), desiredKV=803.2 MB OOM!
09-01 02:01:12.382 192.168.233.133:54321 6965 Thread-46 WARN: Swapping! OOM, (K/V:1.75 GB + POJO:513.4 MB + FREE:426.5 MB == MEM_MAX:2.67 GB), desiredKV=840.9 MB OOM!
09-01 02:01:12.384 192.168.233.133:54321 6965 #e Thread WARN: Swapping! GC CALLBACK, (K/V:1.75 GB + POJO:513.4 MB + FREE:426.5 MB == MEM_MAX:2.67 GB), desiredKV=802.7 MB OOM!
09-01 02:01:12.867 192.168.233.133:54321 6965 FJ-3-1 WARN: Swapping! OOM, (K/V:1.75 GB + POJO:513.4 MB + FREE:426.5 MB == MEM_MAX:2.67 GB), desiredKV=1.03 GB OOM!
09-01 02:01:13.376 192.168.233.133:54321 6965 Thread-46 WARN: Swapping! OOM, (K/V:1.75 GB + POJO:513.2 MB + FREE:426.8 MB == MEM_MAX:2.67 GB), desiredKV=803.2 MB OOM!
09-01 02:01:13.934 192.168.233.133:54321 6965 Thread-45 WARN: Swapping! OOM, (K/V:1.75 GB + POJO:513.2 MB + FREE:426.8 MB == MEM_MAX:2.67 GB), desiredKV=841.3 MB OOM!
09-01 02:01:12.867 192.168.233.133:54321 6965 #e Thread WARN: Swapping! GC CALLBACK, (K/V:1.75 GB + POJO:513.2 MB + FREE:426.8 MB == MEM_MAX:2.67 GB), desiredKV=803.2 MB OOM!
In this case, I think the gc
collector is waiting for cleaning some unused memory in swap.
How can I process huge dataset with a limited RAM memory ?
r
tag? – shmoselr
tag. My client is anr
program – lotusirous