0
votes

I am trying to build an uber jar so I can deploy my Spark program doing this:

Run:

sbt assembly

This outputs a lot of errors:

[error] deduplicate: different file contents found in the following:
[error] /Users/samibadawi/.ivy2/cache/commons-collections/commons-collections/jars/commons-collections-3.2.1.jar:org/apache/commons/collections/FastHashMap$CollectionView$CollectionViewIterator.class
[error] /Users/samibadawi/.ivy2/cache/commons-beanutils/commons-beanutils/jars/commons-beanutils-1.7.0.jar:org/apache/commons/collections/FastHashMap$CollectionView$CollectionViewIterator.class

The answers for the question pertaining to Scala 2.10 did not work: spark + sbt-assembly: "deduplicate: different file contents found in the following"

After much hacking I got a hello world project without any useful code to compile using the build.sbt file below:

It seems to be random what goes into exclude and what goes into merge strategy. Is there a simpler more systematic way to do this?

(Besides using: "org.apache.spark" %% "spark-core" % sparkVersion % "provided", In which case there is no deploy dependencies.)

build.sbt excerpt:

import sbtassembly.AssemblyPlugin._

//Define dependencies. These ones are only required for Test and Integration Test scopes.
libraryDependencies ++= Seq(
  ("org.apache.spark" %% "spark-core" % sparkVersion).
    exclude("commons-beanutils", "commons-beanutils-core").
    exclude("commons-collections", "commons-collections").
    exclude("commons-logging", "commons-logging").
    exclude("com.esotericsoftware.minlog", "minlog").
    exclude("com.codahale.metrics", "metrics-core").
    exclude("aopalliance","aopalliance")
    ,
  "org.scalatest"   %% "scalatest"    % "2.2.4"   % "test,it"
)

mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>
  {
    case PathList("javax", "servlet", xs @ _*) => MergeStrategy.last
    case PathList("javax", "inject", xs @ _*) => MergeStrategy.last
    case PathList("javax", "activation", xs @ _*) => MergeStrategy.last
    case PathList("org", "apache", xs @ _*) => MergeStrategy.last
    case PathList("com", "google", xs @ _*) => MergeStrategy.last
    case PathList("com", "esotericsoftware", xs @ _*) => MergeStrategy.last
    case PathList("com", "codahale", xs @ _*) => MergeStrategy.last
    case PathList("com", "yammer", xs @ _*) => MergeStrategy.last
    case "about.html" => MergeStrategy.rename
    case "META-INF/ECLIPSEF.RSA" => MergeStrategy.last
    case "META-INF/mailcap" => MergeStrategy.last
    case "META-INF/mimetypes.default" => MergeStrategy.last
    case "plugin.properties" => MergeStrategy.last
    case "log4j.properties" => MergeStrategy.last
    case x => old(x)
  }
}

Project.inConfig(Test)(assemblySettings)
1

1 Answers

0
votes

Did a little more trail an error and made a build.sbt that worked for my real program:

One problem I had was a jar version duplication problems for Postgres. I solved this by commenting out these dependencies:

//  "org.postgresql" % "postgresql" % "9.4.1212", //Small gap between Doobie and Spark dependency
//  "org.postgis" % "postgis-jdbc" % "1.3.3", //Causes conflicts

I had not yet started to use PostGIS and it had a dependency of postgresql-8.3-603.jdbc4.jar

I had to take out the direct dependency of Postgres.

From the working build.sbt:

    val doobieVersion = "0.4.1"

libraryDependencies ++= Seq(
  "ch.qos.logback" % "logback-classic" % "1.0.13", //comment and warning go away
  "ch.qos.logback" % "logback-core" % "1.0.13",
  "com.citymaps" % "tile-library" % "1.4",
  "com.fasterxml.jackson.module" %% "jackson-module-scala" % "2.7.2",
  "com.github.scopt" %% "scopt" % "3.5.0",
  "com.typesafe.play" %% "play-json" % "2.5.9",
  "org.apache.spark" %% "spark-core" % sparkVersion  % "provided",
  "org.apache.spark" %% "spark-sql" % sparkVersion % "provided",
  "org.apache.spark" %% "spark-mllib" % sparkVersion % "provided",
  "graphframes" % "graphframes" % "0.3.0-spark2.0-s_2.11",
  "org.clapper" %% "grizzled-slf4j" % "1.3.0",
//  "org.postgresql" % "postgresql" % "9.4.1212", //Small gap between Doobie and Spark dependency
//  "org.postgis" % "postgis-jdbc" % "1.3.3", //Causes conflicts
  "org.scalatest" %% "scalatest" % "3.0.0" % "test" withSources() withJavadoc(),
  "org.spire-math" %% "spire" % "0.11.0",
  "org.tpolecat" %% "doobie-core-cats" % doobieVersion,
  "org.tpolecat" %% "doobie-postgres-cats"   % doobieVersion
)

After running

sbt clean

this stopped working. Turns out that there is a conflict for postgis-jdbc the last version is 2.2.1, but the last version that is available on the normal Maven repositories is 1.3.3 and that has the dependency on the old Postgres driver jar.

Looked in many repos and could not find postgis-jdbc 2.2.1.

Downloaded 2.2.1 release https://github.com/postgis/postgis-java

This release has version set to 2.2.2SNAPSHOT. So change the version number in pom.xml and jdbc/pom.xml

Build jar with this command. It is picky about Maven version:

/usr/local/Cellar/maven/3.3.9/bin/mvn install

Now include this dependency

resolvers ++= Seq(
    Resolver.mavenLocal

"net.postgis" % "postgis-jdbc" % "2.2.1",

And run

sbt assembly

It finally worked.