1
votes

I needed to implement a stream solution using AWS Kinesis streams & Lambda.

Lambda function 1 -

It adds data to stream and is invoked every 10 seconds. I added 100 data request ( each one of 1kb) to stream. I am running two instances of the script which invokes the lambda function.

Lambda function 2 -

This lambda uses above stream as trigger. On small volume of data / interval second lambda get data on same time. But on above metrics, data reaches slower than usual ( 10 minutes slower after +1 hour streaming ).

I checked the logic of both lambda functions and verified that, first lambda does not add latency before pushing data to stream. I also verified this by stream packet in second lambda where approximateArrivalTimestamp & current time clearly have the time difference increasing..

Kinesis itself did not have any issues / throttling shown in analytics ( I am using 1 shard ).

Are their any architectural changes I need to make to have it go smoother as I need to scale up at least 10 times like 20 invocations of first lambda with 200 packets, timeout 1 - 10 seconds as later benchmarks.

I am using 100 as the batch size. Can increasing/decreasing it have advantage?

UPDATE : As I explored more online, I found ideas to implement some async / front facing lambda with kinesis which in-turn invoke actual lambda asynchronously, So lambda processing time will not become bottleneck. However, this approach also failed as I have the same latency issue. I have checked the execution time. Front facing lambda ended in 1 second. But still I get big gap between approximateArrivalTimestamp & current time in both lambdas.

Please help!

1

1 Answers

0
votes

For one shard, there will one be one instance of 2nd lambda.

So it works like this for 2nd lambda. The lambda reads configured record size from stream and processes it. It won't read other records until the previous records have been successfully processed.

Adding a second shard, you would have 2 lambdas processing the records. Thus the way I see to scale the architecture is by increasing the number of shards, however make sure data is evenly distributed across shards.