I am trying to run a spark job using SparkLauncher.
My spark Application jar is not a fat jar, and it depends on a lot of other 3rd party jars, is there a way to specify dependency jars in SparkLauncher ?
2 Answers
Use addJar, see
https://spark.apache.org/docs/latest/api/java/org/apache/spark/launcher/SparkLauncher.html#addJar(java.lang.String)
Process spark = new SparkLauncher()
.addJar("/path/to/local/jar/file1.jar")
.addJar("/path/to/local/jar/file2.jar")
The jar file will be distributed in this case.
Or add them to the DRIVER_EXTRA_CLASSPATH and EXECUTOR_EXTRA_CLASSPATH (but the dependencies need to be distributed manually or need to be located in a shared folder, where each worker has access to).
Process spark = new SparkLauncher()
.setConf(SparkLauncher.DRIVER_EXTRA_CLASSPATH, "/path/to/jar/file.jar")
.setConf(SparkLauncher.EXECUTOR_EXTRA_CLASSPATH, "/path/to/jar/file.jar")
You also can include multiple jar files by including all files in the class path:
Process spark = new SparkLauncher()
.setConf(SparkLauncher.DRIVER_EXTRA_CLASSPATH, "/path/to/jar/*")
.setConf(SparkLauncher.EXECUTOR_EXTRA_CLASSPATH, "/path/to/jar/*")
When add multiple jars using addJar method, we seen a problem saying that file path is incorrect or The filename, directory name, or volume label syntax is incorrect. the reason for this problem is sparklauncher internally calls spark-submit that have problem considering jars in double quotes with comma separated. the moment i copy the content of spark-submit2.cmd to spark-submit.cmd then the above problem got resolved and we could able to execute driver.