4
votes

I have one azure storage table where I have a bunch of tasks to be completed by a worker role at a certain time. Example:

       Task 1: -> run every 5 min
       Task 2: -> run every 1 min
       Task 3: -> run every 10 min
       Task 4: -> run every 1 min
       Task 5: -> run every 5 min
       ...........................
       Task 1000: -> run every 1 min

Is this approach correct: Each task has a column DateTime called "LastRun". There is another column called "RunEvery" that stores the time when the task has to be executed. The worker role iterates through all tasks continuously and for each task checks the column "LastRun" with the following approach:

      DateTime currentTime = DateTime.Now;
      if (currentTime >= (myTask.LastRun + myTask.RunEvery))
      {
           myTask.Execute()
      }
      else
      {
           Check.Next.Task.InTable();
      }

What about consuming resources if the worker role runs continuously? How we can spear resources? Or can I implement this in a better way? What is your advice?

6
I might have written what you are looking for, couple days ago. It's a c# scheduler relying on Azure Tables, designed for low transaction consumption (3 table storage transactions per 50 scheduled task), to be used in the context of multiple role instances. Scheduled tasks are "transactionnal", which means that you'll have to call .Handled(); when a task is complete (otherwise it will be fired again "x" hours later). If interested, I still need to clean the code a bit, then I can publish it on github.uzul
Definitely interested! Seams interesting! Please let me know when is ready! Thanks in advance!David Dury
I'm on the way for publishing it.uzul
@uzul Can you please let me know when you publish the article and where I can find it ... Appreciated!David Dury

6 Answers

12
votes

Adding to @Simon Munro's answer: Yet another way to implement task scheduling without external scheduler dependencies is by making use of Quartz library (http://quartznet.sourceforge.net/) in your worker role. I've used it in one of the project and it works extremely well. It gives you a lot of flexibility as far as scheduling tasks are concerned. You would still need to make of blob leasing and Windows Azure Queues to take care of concurrency issue among multiple instances of your worker role.

UPDATE: Inspired by this, I wrote a blog post regarding the same which you can read here: http://gauravmantri.com/2013/01/23/building-a-simple-task-scheduler-in-windows-azure/.

3
votes

Rolling your own scheduling is not a good idea. You get into all sorts of problems unless you lock the data that you are reading. Can you, for example, scale up to tens or hundreds of the same worker role and be sure that each job is only being run the required number of times. You may have to 'lock' your tasks using something like the leases on blob storage.

Although the number of jobs you are looking at may to high, a good approach is to use a cron job service like setcronjob the newly released aditi cloud services. You implement your tasks as web services and hook them up to an external service.

In terms of your resource utilisation, a timer that triggers an event every few minutes is not going to use much in the way of resources. You could have a single thread that executes the tasks as read out of a queue (even a ConcurrentQueue), so you are only executing one tasks at a time (if precise timing is not an issue). Other threads/timers/events can add tasks to the queue.

1
votes

Your current approach appears like it will not scale up to multiple worker roles.

I would suggest a couple of changes:

  1. Using a Storage Queue to store tasks that are ready to be executed. Add a message to the Queue when it is ready to be run, that way other worker roles can participate in executing the tasks. You can also use the queue to hide the task until it is ready to be run too.

  2. Locking a blob resource when you are reading&updating your schedule table, that way only one worker role can schedule tasks.

Keep in mind that your tasks may (in rare cases) be executed twice, so try and design for that.

To avoid hitting the Queues or Tables too frequently, consider exponentially backing off with a Thread.Sleep if your Queue & schedule are empty.

This blog contains more details that may help with your implementation.

1
votes

While Gaurav Mantri has made a great article, showing how to wire Quartz with Azure storage Queue/Table/Blobs in a safe manner, this solution wasn't meeting the requirements of the application I'm working on. Using Queue/Blobs/Table at the same time might become expensive regarding Azure transaction cost, which was one of my main concerns.

I'm currently developing an Azure application that needs to schedule a massive amount of tasks, so I wrote my own "home" solution, couple of days ago, yet it is far from the quality of Quartz, so far it's just a prototype, which yet hasn't been tested properly, but yet it seems to work fine to me.

Design goals

  • Optimize Storage transactions as much as possible. This is done using RangeQueries and BatchOperations only, transaction are grouped as much as possible. Scheduling and Fetching 50 tasks can be done with 3 storage transactions only.
  • Each ScheduledTask has to be properly "committed", (otherwise will be launched again later).
  • Simple and non intrusive API
  • Scheduler class Thread-safe and should be overall safe across multiple instances

The concurrency problem is solved using a Delete operation that will fail if a task was already dequeued at the same time. (handled internally)

I've just published the project here. This was not originally meant to be published, to be considered as this. Please let me know when you'll find a bug.

0
votes

If the tasks do not need to run too frequently one way is to create an Azure SQL table and generate a row for each execution. As columns you would have then execution time and some identifier for the task that should run. So if a task runs once a day and you want to keep it running for 5 years, you would put 5*365 rows.

The worker would run infinite loop, selecting from that table tasks where the execution time is less that current time but which have not been executed yet. With multiple workers you would need to work with transactions to ensure each task gets executed by just one worker.

Or you could even use a similar mechanism with Azure Service Bus. Service bus supports scheduled delivery and there does not seem to be upper limit for message time-to-live. With service bus you would simply push message for each planned execution with delivery time as the execution time. The worker would then pop messages from the queue.

One benefit from using service bus is that you can easily add more workers without fear that they would start working on same tasks.

0
votes

Might be answering an old question. But instead of using heavy weight Cron like libraries (so many of them), probably it's worth it to invest a bit time to learn Rx The Reactive Extensions, and use the Timer there. A simple example from the Rx wiki