1
votes

I recently started working with Spark Scala, HDFS, sbt and Livy. Currently I tried to create livy batch.

Warning: Skip remote jar hdfs://localhost:9001/jar/project.jar.
java.lang.ClassNotFoundException: SimpleApp
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.util.Utils$.classForName(Utils.scala:225)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:686)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

This is the error statement, showing in livy batch log.

My spark-submit command is working perfectly fine for local .jar file.

spark-submit --class "SimpleApp" --master local target/scala-2.11/simple-project_2.11-1.0.jar

But same for livy (in cURL) it is throwing error.

"requirement failed: Local path /target/scala-2.11/simple-project_2.11-1.0.jar cannot be added to user sessions."

So, I shift .jar file in hdfs. My new code for livy is -

curl -X POST --data '{
    "file": "/jar/project.jar",
    "className": "SimpleApp",
    "args": ["ddd"]
}'  
-H 
"Content-Type: application/json" 
http://server:8998/batches

This is throwing error which is mention above.

Please let me know, where am I wrong?

Thanks in advance!

3

3 Answers

0
votes
hdfs://localhost:9001/jar/project.jar.

It's expecting your jar file located on hdfs.

If it's local, maybe you should try to specify protocol in a path, or just upload that into hdfs:

 "file": "file:///absolute_path/jar/project.jar",
0
votes

You have to make a fat jar file with your codebase + necessary jar - sbt assembly or use maven plugin, upload this jar file to HDFS and run spark-submit with this jar file which is placed on HDFS or you can use cURL as well.

Steps with Scala/Java:

  1. Make fat jar with SBT/Maven or whatever.
  2. Upload fat jar to HDFS
  3. Use cURL for submitting jobs:

curl -X POST --data '{ //your data should be here}' -H "Content-Type: plication/json" your_ip:8998/batches

If you don't want to make a fat jar file and upload it to HDFS, you can consider python scripts, it could be submitted like a plain text without any jar file.

The example with plain python code:

curl your_ip:8998/sessions/0/statements -X POST -H 'Content-Type: application/json' -d '{"code":"print(\"asdf\")"}'

In data body, you have to send valid Python code. It's a way in which tools like Jupyter Notebook/Torch works.

Also, I made one more example with Livy and Python. For checking results:

curl your_ip:8998/sessions/0/statements/1

As I mentioned above, for Scala/Java fat jar and uploading to HDFS are required.

0
votes

To use local files for livy batch jobs you need to add the local folder to the livy.file.local-dir-whitelist property in livy.conf.

Description from livy.conf.template:

List of local directories from where files are allowed to be added to user sessions. By default it's empty, meaning users can only reference remote URIs when starting their sessions.