6
votes

I'm trying to design a small message processing system based on SQS, Lambda, and SNS. In case of failure, I'd like for the message to be enqueued in a Dead Letter Queue (DLQ) and for a webhook to be called.

I'd like to know what the most canonical or reasonable way of achieving that would look like.

Currently, if everything goes well, the process should be as follows:

  1. SQS (in place to handle retries) enqueues a message
  2. Lambda gets invoked by SQS and processes the message
  3. Lambda sends a webhook and finishes normally

If something in the lambda goes wrong (success webhook cannot be called, task at hand cannot be processed), the easiest way to achieve what I want seems to be to set up a DLQ1 that SQS would put the failed messages in. An auxiliary lambda would then be called to process this message, pass it to SNS, which would call the failure webhook, and also forward the message to DLQ2, the final/true DLQ.

Is that the best approach?

One alternative I know of is Alarms, though I've been warned that they are quite tricky. Another one would be to have lambda call the error reporting webhook if there's a failure on the last retry, although that somehow seems inappropriate.

Thanks!

1

1 Answers

9
votes

Your architecture looks good enough in case of success, but I personally find it quite confusing if anything goes wrong as I don't see why you need two DLQs to begin with.

Here's what I would do in case of failure:

  1. Define a DLQ on your source SQS Queue and set the maxReceiveCount to i.e 3, meaning if messages fail three times, they will be redirected to the configured DLQ
  2. Create a Lambda that listens to this DLQ.
  3. Execute the webhook inside this Lambda.
  4. Since step 3 automatically deletes the message from the Queue once it has been processed and, apparently, you want the messages to be persisted somewhere, store the content of the message in a file on S3 and store the file metadata (bucket and key) in a table in DynamoDB, so you can always query for failed messages.

I don't see any role for SNS here unless you want multiple subscribers for a given message, but as I see this is not the case.

This way, you need need to maintain only one DLQ and you can get rid of SNS as it's only adding an extra layer of complexity to your architecture.