1
votes

I wrote a spark streaming application which reads data from kafka. I have build the jar with spark1.6.0 and kafka0.8.2.2. I am using kafka direct stream apis:

KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder](ssc, kafkaParams, topicSet)

When i run the application in yarn-client mode , it runs successfully, but when i run in yarn-cluster mode it fails with following exception:

User class threw exception: java.lang.NoClassDefFoundError: kafka/api/TopicMetadataRequest.

I have packaged kafka classes in the application jar and even during runtime, kafka/api/TopicMetadataRequest gets loaded up from the application jar.

As per my understanding, NoClassDefFoundError would occur when there is a version mismatch between compile and runtime.

-----------EDIT------------

My .sbt has folllowng block:

    libraryDependencies ++= Seq( 
"org.apache.spark" %% "spark-core" % "1.6.0" % "provided",   
"org.apache.spark" %% "spark-sql" % "1.6.0" % "provided",   
"org.apache.spark" %% "spark-streaming" % "1.6.0" % "provided",   
"org.apache.spark" %% "spark-mllib" % "1.6.0" % "provided",   
"org.apache.spark" % "spark-streaming-kafka_2.10" % "1.6.0",    
"org.apache.kafka" % "kafka_2.10" % "0.8.2.2",   
"org.springframework.security" % "spring-security-web" % "3.0.7.RELEASE",   
"org.scalatest" % "scalatest_2.10" % "3.0.0-M12" % "test",   
"junit" % "junit" % "4.11",  
"com.typesafe.play" % "play_2.10" % "2.4.0-M2",   
"org.apache.httpcomponents" % "httpclient" % "4.2.5" )  



    mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>   {
        case PathList("META-INF", xs @ _*) => MergeStrategy.discard
        case x => MergeStrategy.first    } }

Any suggestion how to resolve this or why is this exception occurring ?

1
Looks like you're missing a SPARK.JARS setting to send your jar to the cluster.maasg
as per the documentation we dont need to give any option for application jar . right ? $ ./bin/spark-submit --class path.to.your.Class --master yarn --deploy-mode cluster [options] <app jar> [app options]Alok
How did you do that "I have build the jar with spark1.6.0 and kafka0.8.2.2" How did you define the dependency on spark-streaming-kafka Spark module?Jacek Laskowski
i added dependency in sbt file : "org.apache.spark" % "spark-streaming-kafka_2.10" % "1.6.0", "org.apache.kafka" % "kafka_2.10" % "0.8.2.2",Alok
you are building an 'assembly', right?maasg

1 Answers

0
votes

KafkaUtils is not available in Spark and you have to add spark-streaming-kafka module separately to your Spark application.

You should use --packages command-line option.

./bin/spark-shell --packages org.apache.spark:spark-streaming-kafka-0-10_2.10:1.6.0

Use proper versions for Scala and Spark.