I have recently been using spark streaming to process data in kafka.
After the application is started and a few batches are finished, there is a continuous delay.
Most of the time, data processing is completed within 1-5 seconds.
However, after several batches, it took 41 ~ 45 seconds continuously, and most of the delay occurred in the area that fetches data from stage0.
I accidentally found the Kafka request.timemout.ms setting to be 40 seconds by default and changed this setting to 10 seconds.
I then restarted the application and observed that the batch was completed in 11 to 15 seconds.
Actual processing time is 1-5 sec. I can not understand this delay.
What is wrong?
My environment is as follows.
Spark streaming 2.1.0(createDirectStream)
Kafka : 0.10.1
Batch interval : 20s
Request.timeout.ms : 10s
/////
The following capture is the graph when request.timeout.ms is set to 8 seconds.
