2
votes

I'd like to run a Spark application using Structured Streaming with PySpark.

I use Spark 2.2 with Kafka 0.10 version.

I fail with the following error:

java.lang.IncompatibleClassChangeError: Implementing class

spark-submit command used as below:

/bin/spark-submit \
  --packages org.apache.spark:spark-streaming-kafka-0-10_2.11:2.2.0 \
  --master local[*] \
  /home/umar/structured_streaming.py localhost:2181 fortesting

structured_streaming.py code:

from pyspark.sql import SparkSession

spark = SparkSession.builder.appName("StructuredStreaming").config("spark.driver.memory", "2g").config("spark.executor.memory", "2g").getOrCreate()
raw_DF = spark.readStream.format("kafka").option("kafka.bootstrap.servers", "localhost:2181").option("subscribe", "fortesting").load()
values = raw_DF.selectExpr("CAST(value AS STRING)").as[String]
values.writeStream.trigger(ProcessingTime("5 seconds")).outputMode("append").format("console").start().awaitTermination()
2

2 Answers

2
votes

You need spark-sql-kafka for structured streaming:

--packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.2.0

Also make sure that you use the same versions of Scala (2.11 above) and Spark (2.2.0) as you use on your cluster.

-1
votes

Please reference This

You're using spark-streaming-kafka-0-10 which currently not support python.