0
votes

A very simple application running on a Spark cluster with 2 workers, using Kinesis with 2 shards.

And I check the Kinesis Streams Application State on DynamoDB (show in this screenshot) at region North Virginia.

I start and stop workers from time to time, and I just noticed, when the leaseOwner for 2 shards is the same worker, application works fine.

But when I stop the current leaseOwner (10.0.7.63), then there will be a owner switch and new owner will be the other worker (10.0.7.62), then my application pulls data and no data returned from Kinesis (but, the connection with Kinesis is still on).

My guess, is that when the owner is switched to another worker, the checkpoints on the new owner is not matching what is left inside Kinesis, and the pulling the data will get nothing.

Could anyone please explain a bit what's going on here? Am I guessing it right?

Thanks a lot.

enter image description here

1
First of all, just a friendly reminder; define "workerID" in the configuration of your application with hostname; it will help you with more user friendly names. Second, are you sure the shard-000 receives data? Maybe you've set a static partition key on consumer side and that is causing the data to stack on only shard-001?az3
@az3 oh you are right! I'm using a static partition key! I forgot it myself...thank you so much helping me out :)keypoint
@az3 - suggest putting that as an answer :)Krease
I'll rewrite that as an answer, thanks!az3

1 Answers

0
votes

First of all, just a friendly reminder; define "workerID" in the configuration of your application with hostname; it will help you with more user friendly names.

Second, are you sure the shard-000 receives data? Maybe you've set a static partition key on consumer side and that is causing the data to stack on only shard-001?