1
votes

I am trying to run my spark program using spark submit on yarn cluster, I am reading an external config file which is put in the hdfs, I am running the job as-

./spark-submit --class com.sample.samplepack.AnalyticsBatch --master yarn-cluster --num-executors 3 --driver-memory 512m --executor-memory 512m --executor-cores 1 --driver-java-options "-Dext.properties.dir=hdfs://namenode:8020/tmp/some.conf" PocSpark-1.0-SNAPSHOT-job.jar 10

But it is unable to read the file from hdfs, I have also tried to run the job on local mode with conf file as hdfs path and I am getting-

java.io.FileNotFoundException: hdfs:/namenode:8020/tmp/some.conf (No such file or directory)

Here the after hdfs protocol forward slash is missing. Any help will be appreciated here.

2
can you see this file using hadoop utility? hadoop fs -ls /tmp/ - Nikita
yes file is available but spark-submit is unable to read the hdfs file path in my opinion. - Y0gesh Gupta
Do you have environment variable HADOOP_CONF_DIR. Type echo $HADOOP_CONF_DIR in console to check? - Nikita
Hi thanks for the reply, but yes HADOOP_CONF_DIR is set. The issue is that spark submit is filtering the '//' in hdfs://namenode:8020/tmp/some.conf to hdfs:/namenode:8020/tmp/some.conf and unable to reach the hdfs path. - Y0gesh Gupta

2 Answers

0
votes

You have to set HADOOP_CONF_DIR environment variable. It must point to directory with core-site.xml (it may be something like ../hadoop-2.6.0/etc/hadoop_dir) And core-site.xml must contain:

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://yourHost:54310</value>
    </property>
</configuration>

Hope this will help!

0
votes

Setting the spark-submit parameter like the following may fix the issue

hdfs:///namenode:8020//tmp//some.conf