3
votes

If you plan to subscribe more CloudWatch log data to a specific Kinesis stream than a single shard can handle, is it possible to scale your stream by adding multiple shards, and then to distribute the multiple CloudWatch log subscriptions across those shards?

The docs here kind of gloss over shard handling, referring only "shardId-000000000000".

The API docs (at least for the .NET SDK, anyway) suggest that a destination arn is specified when creating the subscription, but my understanding is that an arn can be no more specific than a Kinesis Stream, but I do not think individual shards are assigned arns.

Essentially, if you're planning on subscribing more CloudWatch data than a single shard can handle, is there a way to "scale up" your stream into a multi-shard stream (while using CloudWatch subscriptions and avoiding writing a custom client to process the data), or will it strictly be necessary to "scale out" into a multiple single-shard streams?

1
You can split shards to increase kinesis stream capacity: docs.aws.amazon.com/kinesis/latest/APIReference/…Guy
Yes of course. But when you use the Kinesis API to transmit records to a Kinesis Stream, you must be somewhat explicit about which shard in the stream to which you want your record to go (see the required PartitionKey parameter here docs.aws.amazon.com/kinesis/latest/APIReference/… ) The CloudWatch documentation makes no mention of such a parameter, nor does the documentation make any mention of how it might behave in the absence of such a parameter.user74754
You are not writing to a shard by ID, you are writing using the partition key. Each one of your shards has a range of hash keys, and the partition key will be mapped to one of them. Therefore, you only need to set the partition key in your API call.Guy
There is no API call when you use a CloudWatch subscription, other than the API call which you initially use to create the subscription, and that call only allows you to specify an arn for the Kinesis Stream, but gives you no control over the shard/partition key. aws.amazon.com/about-aws/whats-new/2015/06/…user74754
You are right. The partition key is used in regular Kinesis calls to allow both distribution, and to keep related records together (=in the same shard). In the CloudWatch Logs Subscription, this is less relevant and you need mainly the even distribution across the shards. Therefore a random key is probably used. If you need more events (or less filtering) you can add more shards to the Kinesis steam.Guy

1 Answers

6
votes

I received this answer from my org's AWS representative:

The CloudWatch subscription internally creates a PartitionKey for each message based off of all of the following parameters: The ownerId, the logGroupName, and the logStreamName.

Based on the lack of mention in the documentation, I had assumed that the shard partitionkey was pretty much neglected by the CloudWatch subscription system, but instead it appears as though you automatically get a pretty decent mechanism to distribute your messages across your stream's shards.