0
votes

I have an sbt project that I am trying to build into a jar with the sbt-assembly plugin.

build.sbt:

      name := "project-name"

      version := "0.1"

      scalaVersion := "2.11.12"

      val sparkVersion = "2.4.0"

      libraryDependencies ++= Seq(
        "org.scalatest" %% "scalatest" % "3.0.5" % "test",
        "org.apache.spark" %% "spark-core" % sparkVersion % "provided",
        "org.apache.spark" %% "spark-sql" % sparkVersion % "provided",
        "org.apache.spark" %% "spark-streaming" % sparkVersion % "provided",
        "com.holdenkarau" %% "spark-testing-base" % "2.3.1_0.10.0" % "test",
        // spark-hive dependencies for DataFrameSuiteBase. https://github.com/holdenk/spark-testing-base/issues/143
        "org.apache.spark" %% "spark-hive" % sparkVersion % "provided",
        "com.amazonaws" % "aws-java-sdk" % "1.11.513" % "provided",
        "com.amazonaws" % "aws-java-sdk-sqs" % "1.11.513" % "provided",
        "com.amazonaws" % "aws-java-sdk-s3" % "1.11.513" % "provided",
        //"org.apache.hadoop" % "hadoop-aws" % "3.1.1"
        "org.json" % "json" % "20180813"
      )

      assemblyOption in assembly := (assemblyOption in assembly).value.copy(includeScala = false)
      assemblyMergeStrategy in assembly := {
       case PathList("META-INF", xs @ _*) => MergeStrategy.discard
       case x => MergeStrategy.first
      }
      test in assembly := {}

      // https://github.com/holdenk/spark-testing-base
      fork in Test := true
      javaOptions ++= Seq("-Xms512M", "-Xmx2048M", "-XX:MaxPermSize=2048M", "-XX:+CMSClassUnloadingEnabled")
      parallelExecution in Test := false

When I build the project with sbt assembly, the resulting jar contains /org/junit/... and /org/opentest4j/... files

Is there any way to not include these test related files in the final jar?

I have tried replacing the line:

    "org.scalatest" %% "scalatest" % "3.0.5" % "test"

with:

    "org.scalatest" %% "scalatest" % "3.0.5" % "provided"

I am also wondering how the files are included in the jar as junit is not referenced inside build.sbt (there are junit tests in the project however)?

Updated:

    name := "project-name"

    version := "0.1"

    scalaVersion := "2.11.12"

    val sparkVersion = "2.4.0"

    val excludeJUnitBinding = ExclusionRule(organization = "junit")

    libraryDependencies ++= Seq(
      // Provided
      "org.apache.spark" %% "spark-core" % sparkVersion % "provided" excludeAll(excludeJUnitBinding),
      "org.apache.spark" %% "spark-sql" % sparkVersion % "provided" excludeAll(excludeJUnitBinding),
      "org.apache.spark" %% "spark-streaming" % sparkVersion % "provided",
      "com.holdenkarau" %% "spark-testing-base" % "2.3.1_0.10.0" % "provided" excludeAll(excludeJUnitBinding),
      "org.apache.spark" %% "spark-hive" % sparkVersion % "provided",
      "com.amazonaws" % "aws-java-sdk" % "1.11.513" % "provided",
      "com.amazonaws" % "aws-java-sdk-sqs" % "1.11.513" % "provided",
      "com.amazonaws" % "aws-java-sdk-s3" % "1.11.513" % "provided",

      // Test
      "org.scalatest" %% "scalatest" % "3.0.5" % "test",

      // Necessary
      "org.json" % "json" % "20180813"
    )

    excludeDependencies += excludeJUnitBinding

    // https://stackguides.com/questions/25144484/sbt-assembly-deduplication-found-error
    assemblyOption in assembly := (assemblyOption in assembly).value.copy(includeScala = false)
    assemblyMergeStrategy in assembly := {
     case PathList("META-INF", xs @ _*) => MergeStrategy.discard
     case x => MergeStrategy.first
    }


    // https://github.com/holdenk/spark-testing-base
    fork in Test := true
    javaOptions ++= Seq("-Xms512M", "-Xmx2048M", "-XX:MaxPermSize=2048M", "-XX:+CMSClassUnloadingEnabled")
    parallelExecution in Test := false
1
By default, sbt-assembly does not include test jars. I had this problem when a dependency I included itself (incorrectly) listed a test framework as a runtime dependency. Do you know which package pulls in junit?adhominem
Im not sure, if I append each dependency with "required" the test files are still included. Would this mean its not any of the included dependencies pulling them in at runtime?Alex Shapovalov

1 Answers

0
votes

To exclude certain transitive dependencies of a dependency, use the excludeAll or exclude methods.

The exclude method should be used when a pom will be published for the project. It requires the organization and module name to exclude.

For example:

libraryDependencies += 
  "log4j" % "log4j" % "1.2.15" exclude("javax.jms", "jms")

The excludeAll method is more flexible, but because it cannot be represented in a pom.xml, it should only be used when a pom doesn’t need to be generated.

For example,

libraryDependencies +=
  "log4j" % "log4j" % "1.2.15" excludeAll(
    ExclusionRule(organization = "com.sun.jdmk"),
    ExclusionRule(organization = "com.sun.jmx"),
    ExclusionRule(organization = "javax.jms")
  )

In certain cases a transitive dependency should be excluded from all dependencies. This can be achieved by setting up ExclusionRules in excludeDependencies(For sbt 0.13.8 and above).

excludeDependencies ++= Seq(
  ExclusionRule("commons-logging", "commons-logging")
)

JUnit jar file downloads as part of below dependencies.

"org.apache.spark" %% "spark-core" % sparkVersion % "provided" //(junit)
"org.apache.spark" %% "spark-sql" % sparkVersion % "provided"// (junit)
"com.holdenkarau" %% "spark-testing-base" % "2.3.1_0.10.0" % "test" //(org.junit)

To exclude junit file please update your dependency as below.

val excludeJUnitBinding = ExclusionRule(organization = "junit")

  "org.scalatest" %% "scalatest" % "3.0.5" % "test",
  "org.apache.spark" %% "spark-core" % sparkVersion % "provided" excludeAll(excludeJUnitBinding),
  "org.apache.spark" %% "spark-sql" % sparkVersion % "provided" excludeAll(excludeJUnitBinding),
  "org.apache.spark" %% "spark-streaming" % sparkVersion % "provided",
  "com.holdenkarau" %% "spark-testing-base" % "2.3.1_0.10.0" % "test" excludeAll(excludeJUnitBinding)

Update: Please update your build.abt as below.

resolvers += Resolver.url("bintray-sbt-plugins",
  url("https://dl.bintray.com/eed3si9n/sbt-plugins/"))(Resolver.ivyStylePatterns)

val excludeJUnitBinding = ExclusionRule(organization = "junit")

libraryDependencies ++= Seq(
  // Provided
  "org.apache.spark" %% "spark-core" % sparkVersion % "provided" excludeAll(excludeJUnitBinding),
  "org.apache.spark" %% "spark-sql" % sparkVersion % "provided" excludeAll(excludeJUnitBinding),
  "org.apache.spark" %% "spark-streaming" % sparkVersion % "provided",
  "com.holdenkarau" %% "spark-testing-base" % "2.3.1_0.10.0" % "provided" excludeAll(excludeJUnitBinding),
  "org.apache.spark" %% "spark-hive" % sparkVersion % "provided",
  //"com.amazonaws" % "aws-java-sdk" % "1.11.513" % "provided",
  //"com.amazonaws" % "aws-java-sdk-sqs" % "1.11.513" % "provided",
  //"com.amazonaws" % "aws-java-sdk-s3" % "1.11.513" % "provided",

  // Test
  "org.scalatest" %% "scalatest" % "3.0.5" % "test",

  // Necessary
  "org.json" % "json" % "20180813"
)

assemblyOption in assembly := (assemblyOption in assembly).value.copy(includeScala = false)
assemblyMergeStrategy in assembly := {
  case PathList("META-INF", xs @ _*) => MergeStrategy.discard
  case x => MergeStrategy.first
}

fork in Test := true
javaOptions ++= Seq("-Xms512M", "-Xmx2048M", "-XX:MaxPermSize=2048M", "-XX:+CMSClassUnloadingEnabled")
parallelExecution in Test := false

plugin.sbt

addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.13.0")

I have tried and it's not downloading junit jar file.enter image description here