I'd like to run a Spark application using Structured Streaming with PySpark.
I use Spark 2.2 with Kafka 0.10 version.
I fail with the following error:
java.lang.IncompatibleClassChangeError: Implementing class
spark-submit command used as below:
/bin/spark-submit \
--packages org.apache.spark:spark-streaming-kafka-0-10_2.11:2.2.0 \
--master local[*] \
/home/umar/structured_streaming.py localhost:2181 fortesting
structured_streaming.py code:
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("StructuredStreaming").config("spark.driver.memory", "2g").config("spark.executor.memory", "2g").getOrCreate()
raw_DF = spark.readStream.format("kafka").option("kafka.bootstrap.servers", "localhost:2181").option("subscribe", "fortesting").load()
values = raw_DF.selectExpr("CAST(value AS STRING)").as[String]
values.writeStream.trigger(ProcessingTime("5 seconds")).outputMode("append").format("console").start().awaitTermination()