I've been doing some testings and contacting AWS tech support at the same time.
What I do believe at the moment is that:
Amazon Simple Queue Service supports an initial burst of 5 concurrent function invocations and increases concurrency by 60 concurrent invocations per minute. Doc
1/ The thing that does that pooling, is a separate entity. It is most likely to be a lambda function that will long-pool the SQS and then, invoke our lambda functions.
2/ That Pool-Lambda does not take into account any of our Receiver-Lambda at all. It does not care whether the function is running at max capacity or not, or how many max concurrency is available for the Receiver-Lambda
3/ Due to that combination. The behavior is not what we expected from the Lambda-SQS integration. And worse, If you have suddenly, millions of message burst in your queue. The Receiver-Lambda concurrency can never catch up with the amount of messages that the pooling-Lambda is sending, result in loss of work
The test:
- Create one Lambda function that takes 30 seconds to return true;
- Set that function's concurrency to 50;
- Push 300 messages into the queue ( Visibility timeout : 10 Minutes, batch message count: 1, no re-drive )
The result:
- Amount of messages available just increase gradually
- At first, there are few enough messages to be processed by Receiver-Lambda
- After half a minute, there are more messages available than what Receiver-Lambda can handle
- These message would be discarded to dead queue. Due to Pool-Lambda unable to invoke Receiver-Lambda
I will update this answer as soon as I got the confirmation from AWS support
Support answer. As of Q1 2019, TL;DR version
1/ The assumption was correct, there was a "Poller"
2/ That Poller do not take into consideration of reserved concurrency
as part of its algorithm
3/ That poller have hard limit of 1000
Q2-2019 :
The above information need to be updated. Support said that the poller correctly consider reserved concurrency but it should be at least 5. The SQS-Lambda integration is still being updated and this answer will not. So please consult AWS if you get into some weird issues
...the handler can return a value...if you use the Event invocation type (asynchronous execution), the value is discarded.
I checked the analytic, the lambda instances did NOT timeout. – user1187968