The consumer takes care of compressed messages. However there are a few things to consider.
I was receiving this warning:
19/07/12 17:49:15 WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID
1, 10.0.2.15, executor 0): java.lang.AssertionError: assertion failed:
Got wrong record for spark-executor-1 public_test1 5 even after
seeking to offset 1
I solved this by going to version 2.4.0 of spark-streaming-kafka-0-10_2
and also setting: spark.streaming.kafka.allowNonConsecutiveOffsets=true
My submit command looks like:
spark-submit --class com.streamtest.Main --master
spark://myparkhost:7077 --packages
org.apache.spark:spark-streaming-kafka-0-10_2.11:2.4.0,org.apache.spark:spark-streaming_2.11:2.3.0,org.apache.spark:spark-core_2.11:2.3.0
--conf spark.streaming.kafka.allowNonConsecutiveOffsets=true /work/streamapp/build/libs/streamapp.jar
I hope this is useful to help anybody with the same problem I had.