Is AWS SQS FIFO queue really exact-once delivery

Question

I have the below function handler code.

public async Task FunctionHandler(SQSEvent evnt, ILambdaContext context)
{
    foreach (var message in @event.Records)
    {
        // Do work
        // If a message processed successfully delete the SQS message
        // If a message failed to process throw an exception
    }
}

It is very confusing that while I don't handle validation logic for creating records in my database (already exists) I see database records with the same ID created twice meaning the same message processed more than once!

In my code, I am deleting the message after successful processing or throw an exception upon failure assuming all remained ordered messages will just go back to the queue visible for any consumer to reprocess but I can see code failing now because the same records are created twice for an event that succeeded.

Is AWS SQS FIFO exact-once delivery or am I missing some kind of retry processing policy?

This is how I delete the message upon successful processing.

var deleteMessageRequest = new DeleteMessageRequest
{
    QueueUrl = _sqsQueueUrl,
    ReceiptHandle = message.ReceiptHandle
};

var deleteMessageResponse =
    await _amazonSqsClient.DeleteMessageAsync(deleteMessageRequest, cancellationToken);

if (deleteMessageResponse.HttpStatusCode != HttpStatusCode.OK)
{
    throw new AggregateSqsProgramEntryPointException(
        $"Amazon SQS DELETE ERROR: {deleteMessageResponse.HttpStatusCode}\r\nQueueURL: {_sqsQueueUrl}\r\nReceiptHandle: {message.ReceiptHandle}");
}

The documentation is very explicit about this

"FIFO queues provide exactly-once processing, which means that each message is delivered once and remains available until a consumer processes it and deletes it."

They also mention protecting your code from retries but that is confusing for an exactly-once delivery queue type but then I see the below in their documentation which is confusing.

Exactly-once processing. Unlike standard queues, FIFO queues don't introduce duplicate messages. FIFO queues help you avoid sending duplicates to a queue. If you retry the SendMessage action within the 5-minute deduplication interval, Amazon SQS doesn't introduce any duplicates into the queue.

Consumer retries (how's this possible)?

If the consumer detects a failed ReceiveMessage action, it can retry as many times as necessary, using the same receive request attempt ID. Assuming that the consumer receives at least one acknowledgement before the visibility timeout expires, multiple retries don't affect the ordering of messages.

Have you explicitly seen two of the same records processed twice by the Lambda function? For example, can you find it in the logs? What do you use to uniquely identify the record? Do both messages have the same MessageGroupId? — John Rotenstein
It sounds like either (1) you're committing the database transaction before the operation successfully completes, or (2) you're not providing SQS with the information needed to dedupe input records. But you don't show the code that does either of these things, so it's impossible to tell which is happening. — Parsifal
I actually solved the issue with grouping and ordering the records internally in the function since the same MessageGroupId records sometimes come in the wrong logical application order. Although still a question why the duplicate records, I can't reproduce it anymore. @JohnRotenstein I actually didn't validate this. For #parsifal I need to commit to the database before the operation (function) completes. Deduplication is on AWS but I set the FIFO dedup now to MessageGroupId not Queue level. — George Taskos

George Taskos George Taskos · Accepted Answer · 2021-05-18T12:58:56

This was entirely our application error and how we treat the Eventssourcing aggregate endpoints due to non-thread-safe support.

Is AWS SQS FIFO queue really exact-once delivery

1 Answers