0
votes

We are building a system to precalculate all prices for all our customers. Based on some trigger (e.g. price list change) in our ERP we will put all affected customer numbers in a queue and an Azure Function listening to that queue will recalculate the prices for that specific customer.

Example: A change is done to a price list and the user clicks Save. 3000 customers are affected by this change and are added to the calculation queue. If calculation takes two seconds and we can do 10 calculations in parallel, the last customer's prices would be done after 3000*2/10=600 seconds. While a customer number is waiting in the queue, the user does another change and clicks Save. In this case we would like to exclude adding all customer numbers that are already present in the queue.

Question: Azure Service Bus Queue has a duplicate detection feature, but that is time based. Is there some other means to avoid adding a message to a queue if another message with the same content is already in the queue?

NB: All other resources are in Azure so we are only looking at Azure based queues and eventing solutions.

1

1 Answers

0
votes

Azure Service Bus Queue has a duplicate detection feature, but that is time based. Is there some other means to avoid adding a message to a queue if another message with the same content is already in the queue?

Duplicate detection is based on the message ID over time. If you want content-based deduplication, you will need to turn the message ID into the content hash. That way the same content will generate the same message ID and subsequent messages will be dropped. I've described the idea in detail in my post.

Update

From the comment to the answer:

Does this work for only removed stuff on the queue? If my message was processed 5 minutes ago I definitely want to rerun it. But if my message is still in the queue I don't want to add it again.

This is not how de-duplication works. It doesn't care if the message is in the queue or not, it looks at the message ID and checks if a message with the identical ID has been processed within de-duplication time window. If the answer is "yes", the message will be considered a duplicate and will be discarded. Otherwise, the message will be processed. If the time window singe the last message with that ID has elapsed, the message is not considered a duplicate and will be processed.

Perhaps what you need is not processing straight off the queue. Rather, use the queue to send work items to a data store and periodically query the data store to execute work items. That way, if a duplicate message arrives on the queue while there's still the same work item in the data store, the message is discarded and not written to the data store. Otherwise, the message is written to the data store and a periodic process queries the data store to execute the work items.