I'm using Kafka 0.9 and Spark 1.6. Spark Streaming application streams messages from Kafka through direct stream API (Version 2.10-1.6.0).
I have 3 workers with 8 GB memory each. For every minute I get 4000 messages to Kafka and in spark each worker is streaming 600 messages. I always see a lag on the Kafka offset to Spark offset.
I have 5 Kafka partitions.
Is there a way to make Spark stream more messages for each pull from Kafka?
My streaming frequency is 2 seconds
spark configurations in the app
"maxCoresForJob": 3,
"durationInMilis": 2000,
"auto.offset.reset": "largest",
"autocommit.enable": "true",