10
votes

The spark-daria project is uploaded to Spark Packages and I'm accessing spark-daria code in another SBT project with the sbt-spark-package plugin.

I can include spark-daria in the fat JAR file generated by sbt assembly with the following code in the build.sbt file.

spDependencies += "mrpowers/spark-daria:0.3.0"

val requiredJars = List("spark-daria-0.3.0.jar")
assemblyExcludedJars in assembly := {
  val cp = (fullClasspath in assembly).value
  cp filter { f =>
    !requiredJars.contains(f.data.getName)
  }
}

This code feels like a hack. Is there a better way to include spark-daria in the fat JAR file?

N.B. I want to build a semi-fat JAR file here. I want spark-daria to be included in the JAR file, but I don't want all of Spark in the JAR file!

1

1 Answers

1
votes

The README for version 0.2.6 states the following:

In any case where you really can't specify Spark dependencies using sparkComponents (e.g. you have exclusion rules) and configure them as provided (e.g. standalone jar for a demo), you may use spIgnoreProvided := true to properly use the assembly plugin.

You should then use this flag on your build definition and set your Spark dependencies as provided as I do with spark-sql:2.2.0 in the following example:

libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.2.0" % "provided"

Please note that by setting this your IDE may no longer have the necessary dependencies references to compile and run your code locally, which would mean that you would have to add the necessary JARs to the classpath by hand. I do this often on IntelliJ, what I do is having a Spark distribution on my machine and adding its jars directory to the IntelliJ project definition (this question may help you with that, should you need it).