1
votes

I am considering moving my web application to Windows Azure for scalability purposes but I am wondering how best to partition my application.

I expect my scenario is typical and is as follows: my application allows users to upload raw data, this is processed and a report is generated. The user can then review their raw data and view their report.

So far I’m thinking a web role and a worker role. However, I understand that a VHD can be mounted to a single instance with read/write access so really both my web role and worker role need access to a common file store. So perhaps I need a web role and two separate worker roles, one worker role for the processing and the other for reading and writing to a file store. Is this a good approach?

I am having difficulty picturing the plumbing between the roles and concerned of the overhead caused by the communication between this partitioning so would welcome any input here.

2

2 Answers

2
votes

Adding to Stuart's excellent answer: Blobs can store anything, with sizes up to 200GB. If you needed / wanted to persist an entire directory structure that's durable, you can mount a VHD with just a few lines of code. It's an NTFS volume that your app can interact with, just like any other drive.

In your case, a vhd doesn't fit well, because your web app would have to mount a vhd and be the sole writer to it. And if you have more than one web role instance (which you would if you wanted the SLA and wanted to scale), you could only have one writer. In this case, individual blobs fit MUCH better.

As Stuart stated, this is a very normal and common pattern. And again, with only a few lines of code, you can call upon the storage sdk to copy a file from blob storage to your instance's local disk. Then you can process the file using regular File IO operations. When your report is complete, another few lines of code lets you copy your report into a new blob (most likely in a well-known container that the web role knows to look in).

You can take this a step further and insert rows into an Azure table that are partitioned by customer, with row key identifying the individual uploaded file, and a 3rd field representing the URI to the completed report. This makes it trivial for the web app to display a customer's completed reports.

1
votes

Blob storage is the easiest place to store files which lots of roles and role instances can then access - with none of them requiring special access.

The normal pattern suggested seems to be:

  • allow the raw files to be uploaded using instances of a web role
  • these web role instances return the HTTP call without doing processing - they store the raw files in blob storage, and add a "do this work message" to a queue.
  • the worker role instances pick up the message from the queue, read the raw blob, do the work, store the report result, then delete the message from the queue
  • all the web roles can then access the report when the user asks for it

That's the "normal pattern suggested" and you can see it implemented in things like the photo upload/thumbnail generation apps from the very first Azure PDC - its also used in this training course - follow through to the second page.

Of course, in practice you may need to build on this pattern depending on the size and type of data you are processing.