I have been struggling to get a Multi Node H2O cluster up and running using AWS EC2 instances.
I have followed the advice from this thread, but still struggle with the nodes not seeing each other. The EC2 instances all use the same AMI that I have pre-built, so the same h2o.jar
file is on all of them,
I have also tried the following troubleshooting advice:
- Name cluster
-name
- Rather use
-network
flag - Open port 54321 on security group as
0.0.0.0
Here are my steps:
1) Start AWS EC2 in same availability zone and get private IPs and network cidr (172.31.0.0/20
). Put ip addresses into flatfile.txt
172.31.8.210:54321
172.31.9.207:54321
172.31.13.136:54321
2) Copy the flatfile.txt
to all servers to which I want to connect as nodes and start H2O
# cluster_run
library(h2oEnsemble)
library(ssh)
ips <- gsub("(.*):.*", "\\1", readLines("flatfile.txt"))
start_cluster <- function(ip){
# Copy flatfile across
session <- ssh_connect(paste0("ubuntu@", ip), keyfile = "mykey.pem")
scp_upload(session, "flatfile.txt")
# Ensure no h2o instance is already running
out <- ssh_exec_wait(session, "sudo pkill java")
# Start H2O cluster
cmd <- gsub("\\s+", " ", paste0("ssh -i mykey.pem -o 'StrictHostKeyChecking no' ubuntu@", ip,
" 'java -Xmx20g
-jar /home/rstudio/R/x86_64-pc-linux-gnu-library/3.5/h2o/java/h2o.jar
-name mycluster
-network 172.31.0.0/20
-flatfile flatfile.txt
-port 54321 &'"))
system(cmd, wait = FALSE)
}
start_cluster(ips[3])
start_cluster(ips[2])
start_cluster(ips[1])
3) Once this has been done, I now want to connect R
to my new Multi Node cluster
h2o.init(startH2O = F)
h2o.shutdown(prompt = FALSE)
This is where I see that the nodes aren't being picked up:
I have also seen that when I start the H2O cluster on the different nodes, it isnt picking up the other machines within the network: