Architecting multi-service enterprise applications using Azure cloud services

Question

I have some questions regarding architecting enterprise applications using azure cloud services.

Back Story

We have a system made up of about a dozen WCF Windows Services on a SQL backend. We currently have about 10 clients but expect that to grow to potentially a hundred with perhaps a hundred fold increase in the throughput demands on the system. The current system is poorly engineered and is simply not capable of scaling. So now appears to be the appropriate juncture to reengineer on the azure platform.

Process Flow

Let me briefly describe a simplified set of the services and the process flow and then ask some questions I have regarding utilizing azure cloud services to build the new system.

Service A is logged on to an external systems and downloads data continuously

Service B is logged on to a second external systems and downloads data continuously

There can only ever be one logged in instance each of services A and B.

Both A and B hand off their data to Service C which reconciles the data from the two external sources.

Validated and reconciled data is then passed from C to Service D which performs some accounting functions and then passes the resulting data to Services E and F.

Service E is continually logged in to an external system and uploads data to it.

Service F generates reports and publishes them to clients via FTP etc

The system is actually far more complex than this but the above illustrates the processes involved. The system runs 24 hours a day 6 days a week. Queues will be used to buffer messaging between all the services.

We could just build this system using Azure persistent VMs and utilise the service bus, queues etc but that would ties us in to vertical scaling strategy. How could we utilise cloud services to implement it given the following questions.

Questions

Given that Service A, B and E are permanently logged in to external systems there can only ever be one active instance of each. If we implement these as single instance worker roles there is the issue with downtime and patching (which is unacceptable). If we created two instances of each is there a standard way to implement active-passive load balancing with worker roles on azure or would we have to build our own load balancer? Is there another solution to this problem that I haven’t thought of?
Services C and D are a good candidates to scale using multiple worker role instance. However each instance would have to process related data. For example, we could have 4 instances each processing data for 5 individual clients. How can we get messages to be processed in groups (client centric) by each instance? Also, how would we redistribute load from one instance to the remaining instances when patching takes place etc. For example, if instance 1, which processes data for 5 clients, goes down for OS patching, the data for its clients would then have to be processed by the remaining instances until it came back up again. Similarly, how could we redistribute the load if we decide to spin up additional worker roles?

Any insights or suggestions you are able to offer would be greatly appreciated.

Mat

Igorek Igorek · Accepted Answer · 2013-03-11T04:29:47

Question #1: you will have to implement your own load-balancing. This shouldn't be terribly complex as you could use Blob storage Lease functionality to keep a mutex on some blob in the storage from one instance while holding the connection active to your external system. Every X period of time you could renew the lease if you know that connection is still active and successful. Every other worker in the Role could be checking on that lease to see if it expires. If it ever expires, the next worker would jump in and acquire the lease, and then open the connection to the external source.

Question #2: Look into Azure Service Bus. It has a capability to allow clients to process related messages. More info here: http://geekswithblogs.net/asmith/archive/2012/04/02/149176.aspx All queuing methodologies imply that if a message gets picked up but does not get processed within a configurable amount of time, it goes back onto the queue so that the next available instance can pick it up and process it

You can use something like AzureWatch to monitor the depth of your queues (storage or service bus) and auto-scale number of instances in your C and D roles to match; and monitor instance statuses for roles A, B and E to make sure there is always a healthy instance there and auto-scale if quantity of ready instances drop to 0.

HTH

Architecting multi-service enterprise applications using Azure cloud services

2 Answers