9
votes

I started a yarn cluster mode spark job through spark-submit. To indicate partial failure etc I want to pass exitcode from driver to script calling spark-submit.

I tried both, System.exit and throwing SparkUserAppException in driver, but in both cases CLI only got 1, not what exitcode I passed.

I think it is impossible to pass custom exitcode, since any exitcode passed by driver will be converted to yarn status and yarn will convert any failed exitCode to 1 or failed.

2
Could you tell me the command that you used to submit the job? - code
$SPARK_HOME/bin/spark-submit --verbose .... --master yarn --deploy-mode cluster .... <other options> . I use spark-2.0.0 with hadoop 2.3. Any particular option you are looking for ? - Zxcv Mnb
I think --deploy-mode client would help you. Or at least with some hack, you should be able to achieve what you need. - code
I cannot use client mode, if it is not possible in yarn cluster mode, I would just like someone to confirm that. - Zxcv Mnb

2 Answers

1
votes

By looking at spark code, I can conclude this:

It is possible in client mode. Look at runMain() method of SparkSubmit class

Whereas in cluster mode, it is not possible to get the exit status of the driver because your driver class will be running in one of the executors.

There an alternate solution that might/might not be suitable for your use case:

Host a REST API with an endpoint to receive the status update from your driver code. In the case of any exceptions, let your driver code use this endpoint to update the status.

1
votes

You can save the exit code in the output file (on HDFS or local FS) and make your script wait for this file appearance, read and proceed. This is definitely is not an elegant way, but it may help you to proceed. When saving file, pay attention to the permissions to this location. Your spark process has to have RW access.