I needed to implement a stream solution using AWS Kinesis streams & Lambda.
Lambda function 1 -
It adds data to stream and is invoked every 10 seconds. I added 100 data request ( each one of 1kb) to stream. I am running two instances of the script which invokes the lambda function.
Lambda function 2 -
This lambda uses above stream as trigger. On small volume of data / interval second lambda get data on same time. But on above metrics, data reaches slower than usual ( 10 minutes slower after +1 hour streaming ).
I checked the logic of both lambda functions and verified that, first lambda does not add latency before pushing data to stream. I also verified this by stream packet in second lambda where approximateArrivalTimestamp & current time clearly have the time difference increasing..
Kinesis itself did not have any issues / throttling shown in analytics ( I am using 1 shard ).
Are their any architectural changes I need to make to have it go smoother as I need to scale up at least 10 times like 20 invocations of first lambda with 200 packets, timeout 1 - 10 seconds as later benchmarks.
I am using 100 as the batch size. Can increasing/decreasing it have advantage?
UPDATE : As I explored more online, I found ideas to implement some async / front facing lambda with kinesis which in-turn invoke actual lambda asynchronously, So lambda processing time will not become bottleneck. However, this approach also failed as I have the same latency issue. I have checked the execution time. Front facing lambda ended in 1 second. But still I get big gap between approximateArrivalTimestamp & current time in both lambdas.
Please help!