3
votes

I have a simple video encoding worker role that pulls messages from a queue encodes a video then uploads the video to storage. Everything seems to be working but occasionally when deleting the message after I am done encoding and uploading I get a "StorageClientException: The specified message does not exist." Although the video is processed, I believe the message is reappearing in the queue because it's not being deleted correctly. I have the message visablilty set to 5 mins, none of the videos have taken more than 2 to process.

  • Is it possible that another instance of the Worker role is processing and deleting the message?
  • Doesn't the GetMessage() prevent other worker roles from picking up the same message?
  • Am I doing something wrong in the setup of my queue?
  • What could be causing this message to not be found on delete?

some code...

  //onStart() queue setup
  var queueStorage = _storageAccount.CreateCloudQueueClient();
  _queue = queueStorage.GetQueueReference(QueueReference);
  queueStorage.RetryPolicy = RetryPolicies.Retry(5, new TimeSpan(0, 5, 0));
  _queue.CreateIfNotExist();


 public override void Run()
  {
        while (true)
        {
            try
            {
                var msg = _queue.GetMessage(new TimeSpan(0, 5, 0));
                if (msg != null)
                {
                   EncodeIt(msg);
                   PostIt(msg);
                   _queue.DeleteMessage(msg);
                }
                else
                {
                    Thread.Sleep(WaitTime);
                }
            }
            catch (StorageClientException exception)
            {
                BlobTrace.Write(exception.ToString());
                Thread.Sleep(WaitTime);
            }
        }
    }
3

3 Answers

3
votes

If encoding process takes more time than the message invisibility timeout (5 minutes in your case), then the message will show up in the queue again. This will cause second worker to start processing it. However, chances are that by the time second worker finishes processing, first worker would already be done with the work, deleting it properly. This will cause the second worker to fail at the deletion phase, since the message no longer exists for him.

This happens due to the lightweight transactional model by Windows Azure Queues. It guarantees, that the message will be processed at least once (even if the worker fails silently), but does not guarantee "only once" processing.

Since your encoding process seems to be idempotent and lightweight (since error shows up infrequently), I'd just I advise to increase the invisibility timeout and explicitly capture this exception (by status codes) around DeleteMessages (optionally logging the process duration in order to be able to tweak invisibility timeouts further).

1
votes

Is it possible it's taking longer than the five minutes you've set as a timeout?

0
votes

I had my development, production and stage all pulling from the same queue this was causing some strange behavior. I believe this to be the culprit.