12
votes

I'd read many posts and articles about comparing SQL Azure and Table Service and most of them told that Table Service is more scalable than SQL Azure.

Sorry for http, I'm new user >_< But http://azurescope.cloudapp.net/BenchmarkTestCases/ benchmark shows different picture.

My case. Using SQL Azure: one table with many inserts, about 172,000,000 per day(2000 per second). Can I expect good perfomance for inserts and selects when I have 2 million records or 9999....9 billion records in one table?

Using Table Service: one table with some number of partitions. Number of partitions can be large, very large.

Question #1: is Table service has some limitations or best practice for creating many, many, many partitions in one table?

Question #2: in a single partition I have a large amount of small entities, like in SQL Azure example above. Can I expect good perfomance for inserts and selects when I have 2 million records or 9999 billion entities in one partition?

I know about sharding or partition solutions, but it is a cloud service, is cloud not powerfull and do all work without my code skills?

Question #3: Can anybody show me benchmarks for quering on large amount of datas for SQL Azure and Table Service?

Question #4: May be you could suggest a better solution for my case.

2

2 Answers

6
votes

Short Answer

  1. I haven't seen lots of partitions cause Azure Tables (AZT) problems, but I don't have this volume of data.
  2. The more items in a partition, the slower queries in that partition
  3. Sorry no, I don't have the benchmarks
  4. See below

Long Answer

In your case I suspect that SQL Azure is not going work for you, simply because of the limits on the size of a SQL Azure database. If each of those rows you're inserting are 1K with indexes you will hit the 50GB limit in about 300 days. It's true that Microsoft are talking about databases bigger than 50GB, but they've given no time frames on that. SQL Azure also has a throughput limit that I'm unable to find at this point (I pretty sure it's less than what you need though). You might be able to get around this by partitioning your data across more than one SQL Azure database.

The advantage SQL Azure does have though is the ability to run aggregate queries. In AZT you can't even write a select count(*) from customer without loading each customer.

AZT also has a limit of 500 transactions per second per partition, and a limit of "several thousand" per second per account.

I've found that choosing what to use for your partition key (PK) and row key depends (RK) on how you're going to query the data. If you want to access each of these items individually, simply give each row it's own partition key and a constant row key. This will mean that you have lots of partition.

For the sake of example, if these rows you were inserting were orders and the orders belong to a customer. If it was more common for you to list orders by customer you would have PK = CustomerId, RK = OrderId. This would mean to find orders for a customer you simply have to query on the partition key. To get a specific order you'd need to know the CustomerId and the OrderId. The more orders a customer had, the slower finding any particular order would be.

If you just needed to access orders just by OrderId, then you would use PK = OrderId, RK = string.Empty and put the CustomerId in another property. While you can still write a query that brings back all orders for a customer, because AZT doesn't support indexes other than on PartitionKey and RowKey if your query doesn't use a PartitionKey (and sometimes even if it does depending on how you write them) will cause a table scan. With the number of records you're talking about that would be very bad.

In all of the scenarios I've encountered, having lots of partitions doesn't seem to worry AZT too much.

Another way you can partition your data in AZT that is not often mentioned is to put the data in different tables. For example, you might want to create one table for each day. If you want to run a query for last week, run the same query against the 7 different tables. If you're prepared to do a bit of work on the client end you can even run them in parallel.

0
votes

Azure SQL can easily ingest that much data an more. Here's a video I recorded months ago that show a sample (available on GitHub) that shows one way you can do that.

https://www.youtube.com/watch?v=vVrqa0H_rQA

here's the full repo

https://github.com/Azure-Samples/streaming-at-scale/tree/master/eventhubs-streamanalytics-azuresql