1
votes

I am newbie to spark programming. I want to create fat jar which include all dependency jars as well. Currently I am running spark application with following command

./spark-submit -class XYZ --jars dependency_1,dependency_2 main.jar

But I don't want each and every time to pass these dependency jar. I googled it but could not find working solution.

One of way I tried is using assembly plugin. But it's giving following error.

[error] Not a valid command: assembly
[error] Not a valid project ID: assembly
[error] Expected ':' (if selecting a configuration)
[error] Not a valid key: assembly
[error] assembly
[error]  

So please any one have idea which is best way to create fat jar.

Thanks in advance.

Edit1--

My build.sbt--

import AssemblyKeys._

assemblySettings

name := "Update Sim Count"

version := "1.0"

scalaVersion := "2.10.0"

libraryDependencies += "org.apache.spark" %% "spark-core" % "1.6.0"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "1.6.0"
libraryDependencies += "com.datastax.spark" %% "spark-cassandra-connector" % "1.5.0-RC1"

libraryDependencies += "mysql" % "mysql-connector-java" % "5.1.12"

assembly.sbt:

addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.3")

Edit2-

Answer given by @Chobeat worked. Followed that blog. No need of build.scala from that blog. You can only add assembly.sbt and few lines to build.sbt. That will work for you. Thanks @Chobeat for your help.

1

1 Answers

4
votes

Remember to add the sbt assembly plugin to your project. It's not a default command.

Building a fat jar with Spark is a bit tricky at first but it's not black magic. And also it's the correct way to achieve what you want to do.

Follow a tutorial and you will be good:

http://blog.prabeeshk.com/blog/2014/04/08/creating-uber-jar-for-spark-project-using-sbt-assembly/