3
votes

I have developed a Spark application in Java using Eclipse.
So far, I am using the standalone mode by configuring the master's address to 'local[*]'.
Now I want to deploy this application on a Yarn cluster.
The only official documentation I found is http://spark.apache.org/docs/latest/running-on-yarn.html

Unlike the documentation for deploying on a mesos cluster or in standalone (http://spark.apache.org/docs/latest/running-on-mesos.html), there is not any URL to use within SparkContext for the master's adress.
Apparently, I have to use line commands to deploy spark on Yarn.

Do you know if there is a way to configure the master's adress in the SparkContext like the standalone and mesos mode?

1

1 Answers

6
votes

There actually is a URL.

Ensure that HADOOP_CONF_DIR or YARN_CONF_DIR points to the directory which contains the (client side) configuration files for the Hadoop cluster. These configs are used to write to HDFS and connect to the YARN ResourceManager

You should have at least hdfs-site.xml, yarn-site.xml, and core-site.xml files that specify all the settings and URLs for the Hadoop cluster you connect to.

Some properties from yarn-site.xml include yarn.nodemanager.hostname and yarn.nodemanager.address.

Since the address has a default of ${yarn.nodemanager.hostname}:0, you may only need to set the hostname.