2
votes

I do see that we need to make changes on producer side to use Gzip Compression but i am not sure how to Decompress then while reading the Messages. Please through out Some light on where to start . I have My end to End Streaming working for uncompressed messages.

Thanks

2

2 Answers

4
votes

Looks like the decompression is taken care seamlessly by consumer. You don't have to do anything. All you have to do is configure producer with the setting "compression.codec".

Please take a look at this link

0
votes

The consumer takes care of compressed messages. However there are a few things to consider. I was receiving this warning:

19/07/12 17:49:15 WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1, 10.0.2.15, executor 0): java.lang.AssertionError: assertion failed: Got wrong record for spark-executor-1 public_test1 5 even after seeking to offset 1

I solved this by going to version 2.4.0 of spark-streaming-kafka-0-10_2 and also setting: spark.streaming.kafka.allowNonConsecutiveOffsets=true

My submit command looks like:

spark-submit --class com.streamtest.Main --master spark://myparkhost:7077 --packages org.apache.spark:spark-streaming-kafka-0-10_2.11:2.4.0,org.apache.spark:spark-streaming_2.11:2.3.0,org.apache.spark:spark-core_2.11:2.3.0 --conf spark.streaming.kafka.allowNonConsecutiveOffsets=true /work/streamapp/build/libs/streamapp.jar

I hope this is useful to help anybody with the same problem I had.