4
votes

I am trying to run an application in yarn cluster mode. Here are setting of the shell script:

spark-submit --class "com.Myclass"  \
--num-executors 2 \
 --executor-cores 2 \
 --master yarn \
 --supervise \
 --deploy-mode cluster \
../target/ \

Further I am getting following error. here are ERROR DETAILS FROM YARN LOGS APPLICATIONID

INFO : org.apache.spark.deploy.yarn.ApplicationMaster - Registered signal handlers for [TERM, HUP, INT]
DEBUG: org.apache.hadoop.util.Shell - Failed to detect a valid hadoop home directory
java.io.IOException: HADOOP_HOME or hadoop.home.dir are not set.
    at org.apache.hadoop.util.Shell.checkHadoopHome(Shell.java:307)
    at org.apache.hadoop.util.Shell.<clinit>(Shell.java:332)
    at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:79)
    at org.apache.hadoop.yarn.conf.YarnConfiguration.<clinit>(YarnConfiguration.java:590)
    at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil.newConfiguration(YarnSparkHadoopUtil.scala:62)
    at org.apache.spark.deploy.SparkHadoopUtil.<init>(SparkHadoopUtil.scala:52)
    at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil.<init>(YarnSparkHadoopUtil.scala:47)

I tried modifying spark-env.sh like following and I see Hadoop_Home logged but still getting above error. Modified and added following entries to spark-env.sh

export HADOOP_HOME="/usr/lib/hadoop"
echo "&&&&&&&&&&&&&&&&&&&&&& HADOOP HOME " 
echo "$HADOOP_HOME"
export HADOOP_CONF_DIR="$HADOOP_HOME/etc/hadoop"
echo "&&&&&&&&&&&&&&&&&&&&&& HADOOP_CONF_DIR " 
echo "$HADOOP_CONF_DIR"

I see hadoop home logged when I run the spark-submit but still it complains about the the hadoop-home.

1
Does Yarn and your application runs with same user?Sumit
Yes I tried running the system with same user as yarn and also tried hardcoding the hadoop config file in the spark-submit path but still getting the same issue.Alchemist
I too have the same issue and can't find a solution. Like you, I added HADOOP_HOME to the spark-env.sh and verified that it is being sourced at the time of running SparkLauncher, but the containers don't see this value and log the error. I tried setting hadoop.home.dir as a system property using the -D option in SPAR_SUBMIT_OPTS and that too doesn't get passed to the submitted job, so the containers don't see it.haridsv

1 Answers

3
votes

In my spark-env.sh it looks a bit different:

# Make Hadoop installation visible
export HADOOP_HOME=${HADOOP_HOME:-/usr/hdp/current/hadoop-client}
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/etc/hadoop/conf}

Maybe this can help you. Remember to adjust the paths.