0
votes

Shards can be added/removed dynamically using Kinesis API. Also, the Kinesis stream data Producer needs to set the proper "partition key" (which seems to map to a shard as best I can tell) on its PutRecord API calls. So, it seems like your data Producer(s) also need to be aware of the dynamic scaling, to either take advantage of new shards or stop sending to removed shards.

Question: How do my data Producers dynamically keep track of the # of shards that are available on a particular stream and create partition keys to match them?

1

1 Answers

0
votes

Producers should work exactly the way you are saying during re-sharding.

As I know kafka maintains a mapping of partitions info in the client api itself

public final class Cluster {
    private final Map<TopicPartition, PartitionInfo> partitionsByTopicPartition;
    private final Map<String, List<PartitionInfo>> availablePartitionsByTopic;

    // ...
}

And on each send(message, partitionKey), gets the number of active partitions to calculate the partition. see DefaultPartitioner

public class DefaultPartitioner implements Partitioner {

    public int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster) {
        List<PartitionInfo> partitions = cluster.partitionsForTopic(topic);

        List<PartitionInfo> availablePartitions = cluster.availablePartitionsForTopic(topic);

        // ...
    }
}

While, kinesis client simply takes the parameter partitionKey and I think once the record info is send to the kinesis server, where it is decided the exact partition it should go to based on the same logic as how many shards are active. So, I believe calculating kinesis partition/shard is hidden from client api.