Amazon SQS unique message

Question

I'm using SQS as a queue for video encoding and want to ensure that only a single encoding is performed per video.

SQS works fine in that when a message is queued, it will only be received by a single thread. However, it's possible that multiple messages could be sent to the queue for the same video/encoding, meaning the message content would be the same for the particular 'encoding' queue.

Is there anyway to de-duplicate to ensure that for a specific queue, that the messages in the queue or received from a queue, are unique?

One option I thought would be to create a new queue for each encoding type, as the message is sent. So the queue could be named something like encoding-video-id, which would only have a single message and I could check to ensure that the queue does not yet exist. The only "issue" is that there could be 1000's to 10's of thousands of these queues created.

So what could cause you to enqueue the same message multiple times? — Mike Brant
The use case is that users can submit 'encode' which queues the video, in edge cases it's possible for it to be hit multiple times, which would result in multiple messages. — dzm
Just noticed you can create "unlimited" queues in sqs, so possibly the option above could work. — dzm
Even without the possibility of a user queuing a duplicate task, SQS itself does not guarantee "exactly once" delivery of a message. It guarantees "at least once", so SQS itself can deliver duplicate messages. I think the answers to these questions are relevant to your issue: stackoverflow.com/questions/32386877/… and stackoverflow.com/questions/13484845/… — Mark B
@mbaird I think this will end being what needs to be done. Basically using atomic operations in redis and setting a lower TTL on it (which is updated while being processed). Could simply use INCR with a unique key based on the video guid and check if it exists or not. If TTL on this is say 20s and TTL on SQS is 1m, both being updated while a job is being processed every 10s, I think that should solve the issues of dedup and also allow for retries of SQS. — dzm

E.J. Brennan E.J. Brennan · Accepted Answer · 2015-12-01T17:43:14

IMO, creating unlimited amount of queues with a single message in each is a really bad design, even if theoretically it would work.

If it was me, I'd try to make sure each video had some sort of unique identifier, that was the same even if the user 'double-clicked' the process button.

I would invision a system where the video, with a unique name (such as a guid) was uploaded to S3, a message gets put in the queue, your threads pickup the message from the queue and do the encoding and then write the video back to a different S3 bucket, but with the same base name.

Before processing any video, I would first check the 'output bucket' to see if there is already an encoded video there, with the matching name, and if it was - I'd skip the reprocessing and delete the message.

If everything is running on an EC2 local disk (and you are not using S3), then the same could be done using an input and output directory on the hard disk (but that would assume that multiple machines aren't doing the processing.

Its important to remember, that its possible for the same message to get delivered by SQS - even if the user only submitted it once. It happens, though rarely, so whatever system you setup you need to make sure if/when you do get the occassional duplicate it doesn't break anything.

Amazon SQS unique message

5 Answers