Pyspark write files to local on yarn cluster mode

Question

I am trying to run my pyspark code. My destination directory is a local directory. The user with which I am submitting spark-submit command is the super user and has all privileges to read the file from hdfs and write the files to local.

The job is running without any error but there is no output directory or files getting created.

I have set the HADOOP_USER_NAME as super user in my spark code to avoid permission issue as well.

Can someone please help

Mark Mark · Accepted Answer · 2019-07-16T13:53:36

If you are running in YARN cluster mode then the YARN ApplicationMaster is actually running on a node so will be writing out local to the node. If you find which node it was then you should find your output directory and files there.

Pyspark write files to local on yarn cluster mode

1 Answers