0
votes

I am using spark-submit for my job with the command below:

spark-submit script_test.py --master yarn --deploy-mode cluster spark-submit script_test.py --master yarn-cluster --deploy-mode cluster

The job is working fine. I can see it under the Spark History Server UI. However, I cannot see it under the RessourceManager UI ( YARN).

I have the feeling that my job is not sent to the cluster but it is running only in one node. However, I see nothing wrong on the way I use the Spark-submit command.

Am-i wrong? How can I check it? Or send the job to yarn cluster?

1
Do you see any lines with application_XXX_XXX in spark-submit output? If not, try changing log level to INFO. These will be IDs of YARN applications. - Mariusz

1 Answers

0
votes

When you are using --master yarn means that in some place you have configured the yarn-site with hosts, ports, and so on. Maybe the machine where you are using the spark-submit doesn't know where is the Yarn master.

You could check your hadoop/yarn/spark config files, specially the yarn-site.xml to check if the host of the Resource Manager is correct or not.

Those files are in different folders depending on which distribution of Hadoop you are using. In HDP I guess they are in /etc/hadoop/conf

Hope it helps.