0
votes

I'm using Kafka 0.9 and Spark 1.6. Spark Streaming application streams messages from Kafka through direct stream API (Version 2.10-1.6.0).

I have 3 workers with 8 GB memory each. For every minute I get 4000 messages to Kafka and in spark each worker is streaming 600 messages. I always see a lag on the Kafka offset to Spark offset.

I have 5 Kafka partitions.

Is there a way to make Spark stream more messages for each pull from Kafka?

My streaming frequency is 2 seconds

spark configurations in the app

"maxCoresForJob": 3,
"durationInMilis": 2000,
"auto.offset.reset": "largest",
"autocommit.enable": "true",
1
Please include more details, including version of the API and configuration.Alper t. Turker

1 Answers

1
votes

Would you please explain more? did you check which piece of code taking longer to execute? From cloudera manager-> Yarn--> Application -> selection your application --> Application master --> Streaming, then select one batch and click. Try to find out what task is taking longer time to execute. How many executors are you using? for 5 partitions, it is better to have 5 executors.

You can post your transformation logic, there could be some way to tune.

Thanks