1
votes

I am trying to run a spark job using SparkLauncher. My spark Application jar is not a fat jar, and it depends on a lot of other 3rd party jars, is there a way to specify dependency jars in SparkLauncher ?

2

2 Answers

4
votes

Use addJar, see https://spark.apache.org/docs/latest/api/java/org/apache/spark/launcher/SparkLauncher.html#addJar(java.lang.String)

Process spark = new SparkLauncher()
            .addJar("/path/to/local/jar/file1.jar")
            .addJar("/path/to/local/jar/file2.jar")

The jar file will be distributed in this case.

Or add them to the DRIVER_EXTRA_CLASSPATH and EXECUTOR_EXTRA_CLASSPATH (but the dependencies need to be distributed manually or need to be located in a shared folder, where each worker has access to).

Process spark = new SparkLauncher()
            .setConf(SparkLauncher.DRIVER_EXTRA_CLASSPATH, "/path/to/jar/file.jar")
            .setConf(SparkLauncher.EXECUTOR_EXTRA_CLASSPATH, "/path/to/jar/file.jar")

You also can include multiple jar files by including all files in the class path:

Process spark = new SparkLauncher()
            .setConf(SparkLauncher.DRIVER_EXTRA_CLASSPATH, "/path/to/jar/*")
            .setConf(SparkLauncher.EXECUTOR_EXTRA_CLASSPATH, "/path/to/jar/*")
1
votes

When add multiple jars using addJar method, we seen a problem saying that file path is incorrect or The filename, directory name, or volume label syntax is incorrect. the reason for this problem is sparklauncher internally calls spark-submit that have problem considering jars in double quotes with comma separated. the moment i copy the content of spark-submit2.cmd to spark-submit.cmd then the above problem got resolved and we could able to execute driver.