I'm not able to run Kafka with spark-streaming. Following are the steps I've taken till now:
Downloaded the
jarfile "spark-streaming-kafka-0-8-assembly_2.10-2.2.0.jar" and moved it to/home/ec2-user/spark-2.0.0-bin-hadoop2.7/jarsAdded this line to
/home/ec2-user/spark-2.0.0-bin-hadoop2.7/conf/spark-defaults.conf.template->spark.jars.packages org.apache.spark:spark-streaming-kafka-0-8-assembly_2.10:2.2.0
Kafka Version: kafka_2.10-0.10.2.2
Jar file version: spark-streaming-kafka-0-8-assembly_2.10-2.2.0.jar
Python Code:
os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages org.apache.spark:spark-streaming-kafka-0-8-assembly_2.10-2.2.0 pyspark-shell'
kvs = KafkaUtils.createDirectStream(ssc, ["divolte-data"], {"metadata.broker.list": "localhost:9092"})
But I'm still getting the following error:
Py4JJavaError: An error occurred while calling o39.createDirectStreamWithoutMessageHandler.
: java.lang.NoClassDefFoundError: Could not initialize class kafka.consumer.FetchRequestAndResponseStatsRegistry$
at kafka.consumer.SimpleConsumer.<init>(SimpleConsumer.scala:39)
at org.apache.spark.streaming.kafka.KafkaCluster.connect(KafkaCluster.scala:59)
What am I doing wrong?
?spark-shell --.jars metrics-core-2.2.0.jar` - pvy4917spark-2.0.0, but your jars are for2.2.0... Those versions should be the same - OneCricketeer