14
votes

Im using spark (in java API) and require a single jar that can be pushed to the cluster, however the jar itself should not include spark. The app that deploys the jobs of course should include spark.

I would like:

  1. sbt run - everything should be compiled and excuted
  2. sbt smallAssembly - create a jar without spark
  3. sbt assembly - create an uber jar with everything (including spark) for ease of deployment.

I have 1. and 3. working. Any ideas on how I can 2. ? What code would I need to add to my build.sbt file?

The question is not relevant only to spark, but any other dependency that I may wish to exclude as well.

2
Suppose you use other libraries like Scalaz and Dispatch. Do you want that included or excluded from the smallAssembly?Eugene Yokota
good question. Preferably excluded -smallAssembly should only contain the code that is to be deployed on the cluster.user2843110

2 Answers

29
votes

% "provided" configuration

The first option to exclude a jar from the fat jar is to use "provided" configuration on the library dependency. "provided" comes from Maven's provided scope that's defined as follows:

This is much like compile, but indicates you expect the JDK or a container to provide the dependency at runtime. For example, when building a web application for the Java Enterprise Edition, you would set the dependency on the Servlet API and related Java EE APIs to scope provided because the web container provides those classes. This scope is only available on the compilation and test classpath, and is not transitive.

Since you're deploying your code to a container (in this case Spark), contrary to your comment you'd probably need Scala standard library, and other library jars (e.g. Dispatch if you used it). This won't affect run or test.

packageBin

If you just want your source code, and no Scala standard library or other library dependencies, that would be packageBin built into sbt. This packaged jar can be combined with dependency-only jar you can make using sbt-assembly's assemblyPackageDependency.

excludedJars in assembly

The final option is to use excludedJars in assembly:

excludedJars in assembly := {
  val cp = (fullClasspath in assembly).value
  cp filter {_.data.getName == "spark-core_2.9.3-0.8.0-incubating.jar"}
}
7
votes

For beginners like me, simply add the % Provided to Spark dependencies to exclude them from an uber-jar:

libraryDependencies += "org.apache.spark" %% "spark-core" % "1.4.0" % Provided
libraryDependencies += "org.apache.spark" %% "spark-sql" % "1.4.0" % Provided

in build.sbt.