For some time I'm using sparklyr
package to connect to companys Hadoop cluster using the code:
library(sparklyr)
Sys.setenv(SPARK_HOME="/opt/spark/")
Sys.setenv(HADOOP_CONF_DIR="/etc/hadoop/conf.cloudera.yarn")
Sys.setenv(JAVA_HOME="/usr/lib/jvm/jre")
system('kinit -k -t user.keytab user@xyz')
sc <- spark_connect(master="yarn",
config = list(
default = list(
spark.submit.deployMode= "client",
spark.yarn.keytab= "user.keytab",
spark.yarn.principal= "user@xyz",
spark.executor.instances= 20,
spark.executor.memory= "4G",
spark.executor.cores= 4,
spark.driver.memory= "8G")))
and everything works fine, but when I'm trying to add rsparkling
package using similar code:
library(h2o)
library(rsparkling)
library(sparklyr)
options(rsparkling.sparklingwater.version = '2.0')
Sys.setenv(SPARK_HOME="/opt/spark/")
Sys.setenv(HADOOP_CONF_DIR="/etc/hadoop/conf.cloudera.yarn")
Sys.setenv(JAVA_HOME="/usr/lib/jvm/jre")
system('kinit -k -t user.keytab user@xyz')
sc <- spark_connect(master="yarn",
config = list(
default = list(
spark.submit.deployMode= "client",
spark.yarn.keytab= "user.keytab",
spark.yarn.principal= "user@xyz",
spark.executor.instances= 20,
spark.executor.memory= "4G",
spark.executor.cores= 4,
spark.driver.memory= "8G")))
I'm getting error:
Error in force(code) :
Failed while connecting to sparklyr to port (8880) for sessionid (9819): Sparklyr gateway did not respond while retrieving ports information after 60 seconds Path: /opt/spark-2.0.0-bin-hadoop2.6/bin/spark-submit Parameters: --class, sparklyr.Backend, --packages, 'ai.h2o:sparkling-water-core_2.11:2.0','ai.h2o:sparkling-water-ml_2.11:2.0','ai.h2o:sparkling-water-repl_2.11:2.0', '/usr/lib64/R/library/sparklyr/java/sparklyr-2.0-2.11.jar', 8880, 9819---- Output Log ----
Ivy Default Cache set to: /opt/users/user/.ivy2/cache The jars for the packages stored in: /opt/users/user/.ivy2/jars :: loading settings :: url = jar:file:/opt/spark-2.0.0-bin-hadoop2.6/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml ai.h2o#sparkling-water-core_2.11 added as a dependency ai.h2o#sparkling-water-ml_2.11 added as a dependency ai.h2o#sparkling-water-repl_2.11 added as a dependency :: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0 confs: [default]---- Error Log ----
In addition: Warning messages: 1: In if (nchar(config[[e]]) == 0) found <- FALSE : the condition has length 1 and only the first element will be used 2: In if (nchar(config[[e]]) == 0) found <- FALSE : the condition has length 1 and only the first element will be used
I'm new to spark
and clusters
and not really sure what to do now. Any help will be very appreciated. My first thought were missing jar
files for sparkling water
on the cluster
side, am I right?