2
votes

I'm using Kinesis Firehose to compress and save events to S3. The prefix format is YYYY/MM/DD/HH in UTC time. Those events sent to Firehose also contain a UTC time as a field. Using this field I discovered that YYYY/MM/DD/HH objects from S3 contain events also from its precedent and next hour.

The buffer limits I'm using are 128MB/600s.

Do you know if these are hard limits or there could be a chance to buffer events out of these limits ?

1

1 Answers

1
votes

I would expect it to be possible that you'll get events outside of the limits. For example, if Firehose happens to read a few records more than the buffer size, it probably won't defer them to the next buffer. But I can't say that for sure.

More important to your use case, Firehose will write a buffer when either of these limits are reached, then start anew. So if you get 128M in 373 seconds, it won't wait 227 seconds before writing that buffer. You'll see more than 6 S3 objects per hour if this happens.

It's also entirely possible that the records were not written to the stream immediately, due to throughput limitation. Depending on how you handle retries, you could see delays of several seconds.