11
votes

I'm just doing a little research on how to build a large-scale, globally accessible application on Azure.

There's lots of tech in there already to get your application as close to the consumer as possible.

  • CDN Edge Servers for static content shared around the world.
  • Cloud Services in different regions, using Traffic Manager to route a domain name to the nearest application host.

The bit I'm a bit confused about is the Database. If you are using SQL Azure, you have to specify a region to place it in. If my SQL Azure instance is in West Europe (Amsterdam), but my customers are in australia, and accessing the application via an instance in Australia (NSW), there's going to be some latency between the application talking to the database.

All the references I've seen about Geo Replication seem to be in the context of a Master-Slave redundancy setup. But I'm wondering is it feasible to have a Master-Master setup where each application instance talks to it's own SQL Azure master instance in the same Geo-Region, and then sql azure would take care of the Bi-Directional replication between them.

2
Well, does SQL Azure actually supports geo master-master and 'take care' of bi-directional replication?Remus Rusanu
that's kinda what I'm trying to figure out. and if geo-replication doesn't support this, is there some other way to do multi-master replication. on a bit of "first principles knowledge find at the minute and looking for guidance"Eoin Campbell
just stumbled something called "Data Sync" that might be what I'm looking for.Eoin Campbell

2 Answers

8
votes

Active Geo-Replication for Azure SQL Database:

The Active Geo-Replication feature implements a mechanism to provide database redundancy within the same Microsoft Azure region or in different regions (geo-redundancy). Active Geo-Replication asynchronously replicates committed transactions from a database to up to four copies of the database on different servers. The original database becomes the primary database of the continuous copy. Each continuous copy is referred to as an active secondary database. The primary database asynchronously replicates committed transactions to each of the active secondary databases. While at any given point, the active secondary data might be slightly behind the primary database, the active secondary data is guaranteed to always be transactionally consistent with changes committed to the primary database. Active Geo-Replication supports up to four active secondaries, or up to three active secondaries and one offline secondary.

One of the primary benefits of Active Geo-Replication is that it provides a database-level disaster recovery solution. Using Active Geo-Replication, you can configure a user database in the Premium service tier to replicate transactions to databases on different Microsoft Azure SQL Database servers within the same or different regions. Cross-region redundancy enables applications to recover from a permanent loss of a datacenter caused by natural disasters, catastrophic human errors, or malicious acts.

Another key benefit is that the active secondary databases are readable. Therefore, an active secondary can act as a load balancer for read workloads such as reporting. While you can create an active secondary in a different region for disaster recovery, you could also have an active secondary in the same region on a different server. Both active secondary databases can be used to balance read only workloads serving clients distributed across several regions.

Notice that master-master is nowhere mentioned. Replicas are readable, never writable. So the question is really moot, as SQL Azure simply does not support what you wish.

An alternative would be application layer sharding and have each tenant connect to a proximity database, but that assumes that the data is disjoint (Australian customers do not look at South American items). See this answer here.

You can also investigate things like Cassandra, which does support what you want but is a major paradigm shift and you'll need to host it and manage it.

But you also have to ask: is master-master DB required to achieve low latency? Are writes occurring so often in your app? Read latency can easily be improved, that's why you have caching and CDNs for. Think about all Australian users reading this question. Is served from a geo-replicated database for disaster recovery, not from a master-master DB. See How StackOverflow scales SQL Server.

2
votes

Caveat: I have not worked with SQL Azure in this respect, but I have worked with geo replication extensively.

From what I can tell you are correct to say that the Active Geo Replication built in to Azure is a one way copy- you have a master database in one location which shares transactions out to other databases which are available on a read-only basis.

To get full, two-way replication is a very tricky task. The opportunities for failure conditions are enormous and exceedingly hard to test. This is why it is hard to find a lot of people offering two-way replication with transactional databases - even if you have the same data in your databases, they will have different transaction histories and will not accurately mirror one another. Then when you have to decide which database is authoritative things start to get complicated fast.

However, this does not necessarily prevent us from being able to implement practical two way replication. When you know your own data and you understand what needs to be replicated and what does not, you are no longer having to solve replication as an abstract problem, so you can design around the data that you have. If you are contemplating working at this kind of scale, you will be using a lot of queues already for passing data around the place. To take a very simple example, if your service is pushing data into a queue for the database to be able to pick it up then popping it out to store, it would not be hard to push the same data into transfer queues to your other geographical regions during the processing that drops it into the database.

Ultimately you need to ask yourself how many million users you have and how many gigabytes of data they are going to be pushing into your databases. If those numbers are fairly low then two way replication is almost certainly unnecessary and thinking too hard about it is probably a premature optimisation.