0
votes

I am attempting to print messages consumed from Kafka via Spark streaming. However, I keep running into the following error:

16/09/04 16:03:33 ERROR ApplicationMaster: User class threw exception: java.lang.NoClassDefFoundError: org/apache/spark/streaming/kafka/KafkaUtils$

There have been a few questions asked on StackOverflow regarding this very issue. Ex: https://stackguides.com/questions/27710887/kafkautils-class-not-found-in-spark-streaming#=

The answers given have not resolved this issue for me. I have tried creating an "uber jar" using sbt assembly and that did not work either.

Contents of sbt file:

name := "StreamKafka"

version := "1.0"

scalaVersion := "2.10.5"


libraryDependencies ++= Seq(
    "org.apache.kafka" % "kafka_2.10" % "0.8.2.1" % "provided",
    "org.apache.spark" % "spark-streaming_2.10" % "1.6.1" % "provided",
    "org.apache.spark" % "spark-streaming-kafka_2.10" % "1.6.1" % "provided",
    "org.apache.spark" % "spark-core_2.10" % "1.6.1" % "provided" exclude("com.esotericsoftware.minlog", "minlog") exclude("com.esotericsoftware.kryo", "kryo")
)

resolvers ++= Seq(
    "Maven Central" at "https://repo1.maven.org/maven2/"
)


assemblyMergeStrategy in assembly := {
  case m if m.toLowerCase.endsWith("manifest.mf")          =>     MergeStrategy.discard
  case m if m.toLowerCase.matches("meta-inf.*\\.sf$")      =>     MergeStrategy.discard
  case "log4j.properties"                                  =>     MergeStrategy.discard
  case m if m.toLowerCase.startsWith("meta-inf/services/") =>     MergeStrategy.filterDistinctLines
  case "reference.conf"                                    =>     MergeStrategy.concat
  case _                                                   =>     MergeStrategy.first
  case PathList(ps @ _*) if ps.last endsWith "pom.properties" =>  MergeStrategy.discard
  case x => val oldStrategy = (assemblyMergeStrategy in assembly).value
  oldStrategy(x)
}
3
Does Kafka have to be installed on the Spark node where the job is being submitted?user3357381
remove provided from spark-streaming-kafkavishnu viswanath
@vishnuviswanath I removed provided. The same error is still being thrown.user3357381
oh sorry. you have to remove provided from kafka_2.10 aswellvishnu viswanath
@vishnuviswanath I tried that and I am still getting the same error: ERROR ApplicationMaster: User class threw exception: java.lang.NoClassDefFoundError: org/apache/spark/streaming/kafka/KafkaUtils$.user3357381

3 Answers

1
votes

Posting the answer from the comments so that it will be easy for others to to solve the issue.

You have to remove "provided" from kafka dependencies

"org.apache.kafka" % "kafka_2.10" % "0.8.2.1" % "provided",
"org.apache.spark" % "spark-streaming-kafka_2.10" % "1.6.1" % "provided"

For the dependencies to be bundled in the jar you have to run the command sbt assembly

Also make sure that you are running the right jar file. You can find the right jar file name by checking the sbt assembly command's log.

0
votes

This may be silly to ask, but does the streamkafka_‌​2.10-1.0.jar contains the org/apache/spark/streaming/kafka/KafkaUtils.class

0
votes

As long as cluster provides Kafka / Spark classes at runtime, dependencies must be excluded from the assembled JAR. If not, you should expect errors such as this from Java class-loader during application startup.

Additional benefit of assembly without dependencies is faster deployment. If the cluster provides dependancies in runtime it's the best option to omit those decencies using % "provided"