1
votes

I am relative new to h2o and was trying to use xgboost with grid search. I ran my stuff on edgenode with 40 cores and 26 gb memory with version 3.20.0.2 of h2o package in R and h2o. just cpu as backend.

I have run gbm and randomforest without issues (some gbm takes about 2 hours to finish with grid search and they all ran fine). However, when I was trying to run xgboost, i always get error.

If i ran a simple example without grid search, it will run. however, when i ran xgboost with grid search, i always got error as "Error in .h2o.doSafeREST(h2oRestApiVersion = h2oRestApiVersion, urlSuffix = urlSuffix, : Unexpected CURL error: Recv failure: Connection was reset" .

I did my search online and try to figure out what is going on. I found two examples both given by LeDell and one works but not the other.

I got error in R as "Error in .h2o.doSafeREST(h2oRestApiVersion = h2oRestApiVersion, urlSuffix = urlSuffix, : Unexpected CURL error: Recv failure: Connection was reset" for code below https://gist.github.com/ledell/71e0b8861d4fa35b59dde2af282815a5

library(h2o)
h2o.init()


# Load the HIGGS dataset
train <- h2o.importFile("https://s3.amazonaws.com/erin-data/higgs/higgs_train_10k.csv")
test <- h2o.importFile("https://s3.amazonaws.com/erin-data/higgs/higgs_test_5k.csv")
y <- "response"
x <- setdiff(names(train), y)
family <- "binomial"

#For binary classification, response should be a factor
train[,y] <- as.factor(train[,y])
test[,y] <- as.factor(test[,y])

# Some XGboost/GBM hyperparameters
hyper_params <- list(ntrees = seq(10, 1000, 1),
                     learn_rate = seq(0.0001, 0.2, 0.0001),
                     max_depth = seq(1, 20, 1),
                     sample_rate = seq(0.5, 1.0, 0.0001),
                     col_sample_rate = seq(0.2, 1.0, 0.0001))
search_criteria <- list(strategy = "RandomDiscrete",
                        max_models = 10, 
                        seed = 1)

# Train the grid
xgb_grid <- h2o.grid(algorithm = "xgboost",
                     x = x, y = y,
                     training_frame = train,
                     nfolds = 5,
                     seed = 1,
                     hyper_params = hyper_params,
                     search_criteria = search_criteria)


# Sort the grid by CV AUC
grid <- h2o.getGrid(grid_id = xgb_grid@grid_id, sort_by = "AUC", decreasing = TRUE)
grid_top_model <- grid@summary_table[1, "model_ids"]

Plus i also got error in my edgenode as libgomp: Thread creation failed: Resource temporarily unavailable# [thread 140207508600576 also had an error]

A fatal error has been detected by the Java Runtime Environment: SIGSEGV (0xb) at pc=xxxxxxxxxxx[thread 140207503337216 also had an error][thread 140207504389888 also had an error], pid=40095, tid=0x00007f849aaea700

JRE version: Java(TM) SE Runtime Environment (8.0_162-b12) (build 1.8.0_162-b12) Java VM: Java HotSpot(TM) 64-Bit Server VM (25.162-b12 mixed mode linux-amd64 compressed oops)

Problematic frame:

C [libc.so.6+0x358e5] exit+0x35

but i got no issue when i ran code below ( this is also a example given by LeDell in another post)

train <- h2o.importFile("https://s3.amazonaws.com/erin-data/higgs/higgs_train_10k.csv")

y <- "response"
x <- setdiff(names(train), y)

train[,y] <- as.factor(train[,y])

hyperparameters_xgboost <- list(ntrees = seq(10, 20, 10),
                     learn_rate = seq(0.1, 0.2, 0.1),
                    sample_rate = seq(0.9, 1.0, 0.1),
                     col_sample_rate = seq(0.5, 0.6, 0.1))

xgb <- h2o.grid("xgboost",
                x = x,
                y = y, 
                seed = 1,
                training_frame = train,
                max_depth = 3,
                hyper_params = hyperparameters_xgboost) 

Therefore, I cannot tell what went wrong? originally i thought the xgboost does not work, then i had successful run with xgboost only (no grid). Then I guess it must be the grid search part, and then i did get a successful run with the latter example. I am out of ideas and wonder if someone may have some insights about my error?

1
You mentioned that you found two examples online (posted by me) and one works but not the other. Can you edit your post to add the example that does not work? Thanks!Erin LeDell
Hi Erin, the one that not run is in the post here: gist.github.com/ledell/71e0b8861d4fa35b59dde2af282815a5ASU_TY
Hi Erin, just wonder if this could be a version issue? I haven't got a chance to try older version but just wonder. thank you!ASU_TY
I have the same issue! as long as I am not using nfolds, (using validation frame instead) it works fine. but soon after I add nfolds it stops and cluster is down!EmmaStin

1 Answers

0
votes

I cannot reproduce this error on H2O 3.20.0.2:

> library(h2o)
> h2o.init()
 Connection successful!

R is connected to the H2O cluster: 
    H2O cluster uptime:         18 hours 58 minutes 
    H2O cluster timezone:       America/Los_Angeles 
    H2O data parsing timezone:  UTC 
    H2O cluster version:        3.20.0.2 
    H2O cluster version age:    6 days  
    H2O cluster name:           H2O_started_from_R_me_ves048 
    H2O cluster total nodes:    1 
    H2O cluster total memory:   3.28 GB 
    H2O cluster total cores:    8 
    H2O cluster allowed cores:  8 
    H2O cluster healthy:        TRUE 
    H2O Connection ip:          localhost 
    H2O Connection port:        54321 
    H2O Connection proxy:       NA 
    H2O Internal Security:      FALSE 
    H2O API Extensions:         XGBoost, Algos, AutoML, Core V3, Core V4 
    R Version:                  R version 3.5.0 (2018-04-23) 

> # Load the HIGGS dataset
> train <- h2o.importFile("https://s3.amazonaws.com/erin-data/higgs/higgs_train_10k.csv")
  |=================================================================================================| 100%
> test <- h2o.importFile("https://s3.amazonaws.com/erin-data/higgs/higgs_test_5k.csv")
  |=================================================================================================| 100%
> y <- "response"
> x <- setdiff(names(train), y)
> family <- "binomial"
> #For binary classification, response should be a factor
> train[,y] <- as.factor(train[,y])
> test[,y] <- as.factor(test[,y])
> # Some XGboost/GBM hyperparameters
> hyper_params <- list(ntrees = seq(10, 1000, 1),
+                      learn_rate = seq(0.0001, 0.2, 0.0001),
+                      max_depth = seq(1, 20, 1),
+                      sample_rate = seq(0.5, 1.0, 0.0001),
+                      col_sample_rate = seq(0.2, 1.0, 0.0001))
> search_criteria <- list(strategy = "RandomDiscrete",
+                         max_models = 10, 
+                         seed = 1)
> # Train the grid
> xgb_grid <- h2o.grid(algorithm = "xgboost",
+                      x = x, y = y,
+                      training_frame = train,
+                      nfolds = 5,
+                      seed = 1,
+                      hyper_params = hyper_params,
+                      search_criteria = search_criteria)
  |=================================================================================================| 100%
>