0
votes

I'm not able to run Kafka with spark-streaming. Following are the steps I've taken till now:

  1. Downloaded the jar file "spark-streaming-kafka-0-8-assembly_2.10-2.2.0.jar" and moved it to /home/ec2-user/spark-2.0.0-bin-hadoop2.7/jars

  2. Added this line to /home/ec2-user/spark-2.0.0-bin-hadoop2.7/conf/spark-defaults.conf.template -> spark.jars.packages org.apache.spark:spark-streaming-kafka-0-8-assembly_2.10:2.2.0

Kafka Version: kafka_2.10-0.10.2.2

Jar file version: spark-streaming-kafka-0-8-assembly_2.10-2.2.0.jar

Python Code:

os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages org.apache.spark:spark-streaming-kafka-0-8-assembly_2.10-2.2.0 pyspark-shell' 
kvs = KafkaUtils.createDirectStream(ssc, ["divolte-data"], {"metadata.broker.list": "localhost:9092"})

But I'm still getting the following error:

Py4JJavaError: An error occurred while calling o39.createDirectStreamWithoutMessageHandler.
: java.lang.NoClassDefFoundError: Could not initialize class kafka.consumer.FetchRequestAndResponseStatsRegistry$
    at kafka.consumer.SimpleConsumer.<init>(SimpleConsumer.scala:39)
    at org.apache.spark.streaming.kafka.KafkaCluster.connect(KafkaCluster.scala:59)

What am I doing wrong?

1
How did you configure pom? Are you using ` metrics-core-2.2.0.jar? spark-shell --.jars metrics-core-2.2.0.jar` - pvy4917
You're using spark-2.0.0, but your jars are for 2.2.0... Those versions should be the same - OneCricketeer

1 Answers

0
votes

spark-defaults.conf.template is only a template, and not read by Spark, therefore your JARs will not be loaded. You must copy/rename this file to remove the template suffix

You'll also need to download Spark 2.2 if you want to use those specific JAR files.

And make sure that your Spark version uses Scala 2.10 if that's the Kafka package you want to use. Otherwise, use 2.11 version