I am relative new to h2o and was trying to use xgboost with grid search. I ran my stuff on edgenode with 40 cores and 26 gb memory with version 3.20.0.2 of h2o package in R and h2o. just cpu as backend.
I have run gbm and randomforest without issues (some gbm takes about 2 hours to finish with grid search and they all ran fine). However, when I was trying to run xgboost, i always get error.
If i ran a simple example without grid search, it will run. however, when i ran xgboost with grid search, i always got error as "Error in .h2o.doSafeREST(h2oRestApiVersion = h2oRestApiVersion, urlSuffix = urlSuffix, : Unexpected CURL error: Recv failure: Connection was reset" .
I did my search online and try to figure out what is going on. I found two examples both given by LeDell and one works but not the other.
I got error in R as "Error in .h2o.doSafeREST(h2oRestApiVersion = h2oRestApiVersion, urlSuffix = urlSuffix, : Unexpected CURL error: Recv failure: Connection was reset" for code below https://gist.github.com/ledell/71e0b8861d4fa35b59dde2af282815a5
library(h2o)
h2o.init()
# Load the HIGGS dataset
train <- h2o.importFile("https://s3.amazonaws.com/erin-data/higgs/higgs_train_10k.csv")
test <- h2o.importFile("https://s3.amazonaws.com/erin-data/higgs/higgs_test_5k.csv")
y <- "response"
x <- setdiff(names(train), y)
family <- "binomial"
#For binary classification, response should be a factor
train[,y] <- as.factor(train[,y])
test[,y] <- as.factor(test[,y])
# Some XGboost/GBM hyperparameters
hyper_params <- list(ntrees = seq(10, 1000, 1),
learn_rate = seq(0.0001, 0.2, 0.0001),
max_depth = seq(1, 20, 1),
sample_rate = seq(0.5, 1.0, 0.0001),
col_sample_rate = seq(0.2, 1.0, 0.0001))
search_criteria <- list(strategy = "RandomDiscrete",
max_models = 10,
seed = 1)
# Train the grid
xgb_grid <- h2o.grid(algorithm = "xgboost",
x = x, y = y,
training_frame = train,
nfolds = 5,
seed = 1,
hyper_params = hyper_params,
search_criteria = search_criteria)
# Sort the grid by CV AUC
grid <- h2o.getGrid(grid_id = xgb_grid@grid_id, sort_by = "AUC", decreasing = TRUE)
grid_top_model <- grid@summary_table[1, "model_ids"]
Plus i also got error in my edgenode as libgomp: Thread creation failed: Resource temporarily unavailable# [thread 140207508600576 also had an error]
A fatal error has been detected by the Java Runtime Environment: SIGSEGV (0xb) at pc=xxxxxxxxxxx[thread 140207503337216 also had an error][thread 140207504389888 also had an error], pid=40095, tid=0x00007f849aaea700
JRE version: Java(TM) SE Runtime Environment (8.0_162-b12) (build 1.8.0_162-b12) Java VM: Java HotSpot(TM) 64-Bit Server VM (25.162-b12 mixed mode linux-amd64 compressed oops)
Problematic frame:
C [libc.so.6+0x358e5] exit+0x35
but i got no issue when i ran code below ( this is also a example given by LeDell in another post)
train <- h2o.importFile("https://s3.amazonaws.com/erin-data/higgs/higgs_train_10k.csv")
y <- "response"
x <- setdiff(names(train), y)
train[,y] <- as.factor(train[,y])
hyperparameters_xgboost <- list(ntrees = seq(10, 20, 10),
learn_rate = seq(0.1, 0.2, 0.1),
sample_rate = seq(0.9, 1.0, 0.1),
col_sample_rate = seq(0.5, 0.6, 0.1))
xgb <- h2o.grid("xgboost",
x = x,
y = y,
seed = 1,
training_frame = train,
max_depth = 3,
hyper_params = hyperparameters_xgboost)
Therefore, I cannot tell what went wrong? originally i thought the xgboost does not work, then i had successful run with xgboost only (no grid). Then I guess it must be the grid search part, and then i did get a successful run with the latter example. I am out of ideas and wonder if someone may have some insights about my error?