I got a [210,000 x 500] sparse matrix in R which i'm trying to cluster using h2o. I imagined that a 210,000 row matrix is not that large for h2o, but when I try to import it to h2o instance it takes a very long time (let it run over 10 minutes and stopped it before completion) when I subset the first 10,000 rows in a sparse matrix and import it, it takes only a few seconds. and i've tried doing it incrementally and it takes a long time. (by 60,000 I stopped) Is this normal or I'm doing something wrong?
here's what i'm using
library(h2o)
localH2O <- h2o.init(nthreads = -1, max_mem_size = "16g")
spmx.h2o <- as.h2o(sparse_mx)
Below is more info about the h2o instance when it's generated:
java version "1.8.0_131"
Java(TM) SE Runtime Environment (build 1.8.0_131-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)
Starting H2O JVM and connecting: . Connection successful!
R is connected to the H2O cluster:
H2O cluster uptime: 6 seconds 779 milliseconds
H2O cluster version: 3.10.4.6
H2O cluster version age: 1 month and 30 days
H2O cluster name: H2O_started_from_R_M_vto433
H2O cluster total nodes: 1
H2O cluster total memory: 14.22 GB
H2O cluster total cores: 4
H2O cluster allowed cores: 4
H2O cluster healthy: TRUE
H2O Connection ip: localhost
H2O Connection port: 54321
H2O Connection proxy: NA
H2O Internal Security: FALSE
R Version: R version 3.4.0 (2017-04-21)
I'm trying to avoid writing the matrix to file and import again, simply because I think 210,000 rows and 500 columns should not be a problem for h2o to handle