With Apache Spark version 2.1, I would like to use Kafka (0.10.0.2.5) as source for Structured Streaming with pyspark.
In the Kafka topic, I have json messages (pushed with Streamsets Data Collector). However, I am not able to read it with following code:
kafka=spark.readStream.format("kafka") \
.option("kafka.bootstrap.servers","localhost:6667") \
.option("subscribe","mytopic").load()
msg=kafka.selectExpr("CAST(value AS STRING)")
disp=msg.writeStream.outputMode("append").format("console").start()
It generates this error :
java.lang.NoClassDefFoundError: org/apache/kafka/common/serialization/ByteArrayDeserializer
I tried to add at the readStream line:
.option("value.serializer","org.common.serialization.StringSerializer")
.option("key.serializer","org.common.serialization.StringSerializer")
But it does not solve the problem.
Any idea ? Thank you in advance.
org.apache.kafka.common.serialization.StringDeserializerfor both key and value deserializer - Akash Sethihttps://github.com/akashsethi24/Spark-Kafka-Stream-Example/blob/master/src/main/scala/KafkaConsumer.scala- Akash Sethi