3
votes

I am trying to run algorithms in H2o as the dataset is quite large and its a regression problem

I am competing in a kernel only competition named Mercari Price suggestion challenge and thus it requires to run and check the code only in Kaggle Kernels.

I am using R language with an 8 GB RAM

Initially I was able to run glm model and save output csv with the following code

library(glm2)
glm.model2 <- h2o.glm( y = y.dep, x = x.indep, training_frame = train1.h2o, validation_frame = valid1.h2o
,family = "gaussian")

Glm runs quickly in 12 sec without producing error but as soon as I try to run

either gbm or basic deep learning model it produces error

library(gbm)
h2o.gbm(y=y.dep, x=x.indep, training_frame = train1.h2o,validation_frame = valid1.h2o, ntrees = 2000, max_depth = 4, learn_rate = 0.01)

library(randomForest)
rforest.model <- h2o.randomForest(y=y.dep, x=x.indep, training_frame = train1.h2o,validation_frame = valid1.h2o, ntrees = 1000, mtries = 3, max_depth = 4, seed = 1122)


 dlearning.model <- h2o.deeplearning(y = y.dep,
                                      x = x.indep,
                                       training_frame = train1.h2o,
                                       validation_frame = valid1.h2o,
                                       epoch = 60,
                                       hidden = c(100,100),
                                       activation = "Rectifier",
                                       seed = 1122
  )

I get the following error time and again. Please suggest what can be done to solve this problem as glm is running very fine but all other are not at all running

Error in .h2o.doSafeREST(h2oRestApiVersion = h2oRestApiVersion, urlSuffix = urlSuffix, : Unexpected CURL error: Failed to connect to localhost port 54321: Connection refused
Traceback:

It fails even after reaching 10 to 11 percent for both models and I want to know is there any hack or any measure so I can at least run these algorithms so that I can submit my result. I am unable to do built an ensemble model because of all this.

Any measure that can be used as I have run them in kaggle kernel only

2
The tag "ML" concerns the programming language, not machine "learning".molbdnilo
I have used the tag "machine -learning" not "ML"jatin singh
No you didn't. It looks like you did because I replaced it for you a couple of hours ago.molbdnilo
Thanks@molbdnilo. Appreciate itjatin singh
I think that you don't need packages gbm, glm or randomFrorest if you use h2o. Did you type something like library(h2o); localH2O = h2o.init(nthreads = -1) ; ?MrSmithGoesToWashington

2 Answers

0
votes

Failed to connect to localhost port 54321: Connection refused

This is an issue caused by how Kaggle is running H2O in their kernels (which are probably Docker images). The H2O R client is not able to connect to the local H2O server running at localhost:54321.

Something you can try is to start the H2O cluster on a different port. So instead of running h2o.init() do something like h2o.init(port=44444). If they are allowing many people to start H2O clusters on the same machine/port, that may cause some issues. If you are already connected to the H2O cluster in your session, then first run h2o.shutdown(prompt = FALSE) before re-starting H2O on a different port.

I also suggest that you contact a Kaggle admin to see if they can help debug the issue. We've seen issues like this before with Kaggle kernels.

0
votes

You're not able to connect to the server because kernels don't have an internet connection. :)

Update: I've done some more digging and internet access shouldn't be the issue here. I'll file a bug.