0
votes

I have a SQS which will receiving huge of number of messages keep coming to the queue.

And I have an use case which if the number of messages in a queue reach X number (such as 1,000), the system needs to trigger an event to process 1,000 at one time.

And the system will make a chunk of triggers, each trigger has 1,000 of message.

For example, if we have 2300 messages on queue, we expect 3 triggers to a Lambda function, 2 first triggers corresponding with 1,0000 messages, the last one will contain 300 messages.

I'm researching and see CloudWatch Alarm can hook up to SQS metric on "NumberOfMessageReceived" to send to SNS. But I don't know how can I configure a chunk of alarms for each 1,000 messages.

Please advice me if AWS can support this use case or any customize we can make to achieve this.

enter image description here

3
How do you plan executing 1K messages at a time? SQS only allows consumers to fetch 10 messages at once. - Thales Minussi
For the lambda we don't want to process 1000 messages at a time, we would like to have 1 SQS event which its body is the content of 1.000 messages. - Nghia Do
Whatever you are trying to achieve seems to be conflicting. If you are sending ONE message and this message has a body of 1000 attributes, for SQS it still is one message, so you'll never be able to configure an Alarm like that. Also keep in mind that your message has a limit of 256KB, depending on your objects size, this limit is going to be exceeded and you'll lose the message. Am I still missing something? - Thales Minussi
Hi @Thales, maybe my explanation is not good enough. I have copied here the same comment I put to Chris. We have a process (process A) which keep generate files in S3. And the process (process B) we are trying to build is to group 1000 of files together to perform another business logic. After the process A generated a file to S3, they send a SQS message. And the process B must work as a batch of 1000 of files, we don't want to process B processes single single file - Nghia Do
So, from what I understand, it's not one message with 1000 attributes. It would be 1000 messages and you want to process them at once, correct? - Thales Minussi

3 Answers

1
votes

So after going through some clarifications on the comments section with the OP, here's my answer (combined with @ChrisPollard's comment):

Achieving what you want with SQS is impossible, because every batch can only contain up to 10 messages. Since you need to process 1000 messages at once, this is definitely a no-go.

@ChrisPollard suggested to create a new record in DynamoDB every time a new file is pushed to S3. This is a very good approach. Increment the partition key by 1 every time and trigger a lambda through DynamoDB Streams. On your function, run a check against your partition key and, if it equals 1000, you run a query against your DynamoDB table filtering the last 1000 updated items (you'll need a Global Secondary Index on your CreatedAt field). Map these items (or use Projections) to create a very minimal JSON that contains only the necessary information. Something like:

[
    {
     "key": "my-amazing-key",
     "bucket": "my-super-cool-bucket"
    },
    ...
]

A JSON like this is only 87 bytes long (if you take the square brackets out of the game because they won't be repeated, you're left out with 83 bytes). If you round it up to 100 bytes, you can still successfully send it as one event to SQS, as it will only be around 100KB of data.

Then have one Lambda function subscribe to your SQS queue and then finally concatenate the 1 thousand files.

Things to keep in mind:

  1. Make sure you really create the createdAt field in DynamoDB. By the time it hits one thousand, new items could have been inserted, so this way you make sure you are reading the 1000 items that you expected.

  2. On your Lambda check, just run batchId % 1000 = 0, this way you don't need to delete anything, saving DynamoDB operations.

  3. Watch out for the execution time of your Lambda. Concatenating 1000 files at once may take a while to run, so I'd run a couple of tests and put 1 min overhead on top of it. I.e, if it usually takes 5 mins, set your function's timeout to 6 mins.

If you have new info to share I am happy to edit my answer.

0
votes

You can add alarms at 1k, 2k, 3k, etc...but that seems clunky.

Is there a reason you're letting the messages batch up? You can make this trigger event-based (when a queue message is added fire my lambda) and get rid of the complications of batching them.

0
votes

I handled a very similar situation recently, process-A puts objects in an S3 bucket and every time it does it puts a message in the SQS, with the key and bucket details, I have a lambda which is triggered every hour, but it can be any trigger like your cloud watch alarm. Here is what you can do on every trigger:

  • Read the messages from the queue, SQS allows you to read only 10 messages at a time, and every time you read the messages, keep adding them to some list in your lambda, you also get a receipt handle for every message , you can use it to delete the messages and repeat this process until you read all 1000 messages in your queue. Now you can perform whatever operations are required on your list and feed it to process B in a number of different ways , like a file in S3 and/or a new queue that process B can read from.

  • Alternate approach to reading messages: SQS allows you to read only 10 messages at a time, you can send an optional parameter 'VisibilityTimeout':60 that hides the messages from the queue for 60 seconds and you can recursively read all the messages until you dont see any messages in the queue, all while adding them to a list in lambda to process them, this can be tricky since you have to try out different values for visibility time out based on how long it takes to read 1000 messages. Once you know you read all the messages, you can simply have the receipt handles and delete all of them. You can also purge the queue but , you may delete some of the messages that came in during this process that are not read at least once.