Running spark shell on yarn client error

Question

I have Spark 1.6.1 and I have set

export HADOOP_CONF_DIR=/folder/location

Now if I run spark shell: $ ./spark-shell --master yarn --deploy-mode client I get this type of error (relevant part)

    $ 16/09/18 15:49:18 INFO impl.TimelineClientImpl: Timeline service address: http://URL:PORT/ws/v1/timeline/
16/09/18 15:49:18 INFO client.RMProxy: Connecting to ResourceManager at URL/IP:PORT
16/09/18 15:49:18 INFO yarn.Client: Requesting a new application from cluster with 9 NodeManagers
16/09/18 15:49:19 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (14336 MB per container)
16/09/18 15:49:19 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
16/09/18 15:49:19 INFO yarn.Client: Setting up container launch context for our AM
16/09/18 15:49:19 INFO yarn.Client: Setting up the launch environment for our AM container
16/09/18 15:49:19 INFO yarn.Client: Preparing resources for our AM container
16/09/18 15:49:19 WARN shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
16/09/18 15:49:19 ERROR spark.SparkContext: Error initializing SparkContext.
org.apache.hadoop.security.AccessControlException: Permission denied: user=Menmosyne, access=WRITE, inode="/user/Mnemosyne/.sparkStaging/application_1464874056768_0040":hdfs:hdfs:drwxr-xr-x
    at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319)
    at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:292)
    at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:213)

However when I run simply

$ ./spark-shell

(without specifying master) I get a lot more configurations on the screen than usual (ie it should load the configurations in the hadoop folder). So if I don't specify that the master is yarn, do my spark jobs still get submitted to the yarn cluster or not?

RojoSam RojoSam · Accepted Answer · 2016-09-18T17:05:07

The default master in spark is local, that means that the application will run local in your machine and not in the cluster.

Yarn applications, in general (hive, mapreduce, spark, etc...), require to create temporal folders to store the partial data and/or current process configuration. Normally this temporal data is being written inside the HDFS user home (in your case /user/Mnemosyne)

Your problem is that your home folder was created by the user hdfs and your user Mnemosyne doesn't have privileges to write on it.

Then the spark job can not create the temporal structure in HDFS required to launch the application.

My suggestion is that you change the owner of the home folder (each user should be the owner of its home directory) and vaidate that the owner has full access to its home directory.

https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/FileSystemShell.html#chown

Running spark shell on yarn client error

3 Answers