1
votes

Is there a way around 500 entities / second / partition with ATS (Azure Table Storage)? OK with dirty reads. If in insert is not immediately available for read then OK.

Looking to move some large tables from SQL to ATS.

  • Scale: Because of these tables the size is bumping the 150 GB limit of SQL Azure

  • Insert speed:  Inverted index for query speed.  Insert order is not sorted by the table clustered index which causes rapid SQL table fragmentation.  ATS most likely has an insert advantage over SQL.

  • Cost: ATS has a lower monthy cost. But ATS has a higher load cost as millions of rows and cannot batch as the order of the load is not by partition.

  • Query speed: A search is almost never on just one partitionKey. A search will have a SQL component and zero or more ATS components. This ATS query is always by partitionKey and returning rowKeys. Raw search on partitionKey is fast the problem is the time to return the entities (rows). A given partitionKey will have on average 1,000 rowKeys which is 2 seconds at 500 entities / second / partition. But there will be some partitionKeys with over 100,000 rowKeys which equates to over 3 minutes. Return 10,000 rows at a time and in SQL and no query is over 10 seconds as with the power of joins don't have to bring down 100,000 rows to have those rows considered in the where.

  • Is there a was around this select entity speed with ATS? For scale and insert speed would like to go to ATS.

Windows Azure Storage Abstractions and their Scalability Targets

How to get most out of Windows Azure Tables

Designing a Scalable Partitioning Strategy for Windows Azure Table Storage

Turn entity tracking off for query results that are not going to be modified: context.MergeOption = MergeOption.NoTracking;

2

2 Answers

2
votes

One potential workaround is to stripe the data across multiple partitions and/or tables, perform queries across all the (sub)partitions in parallel and merge the results.

For example, for striping across partitions, prepending the partition key with a single digit can multiple the scalability of the partition 10 times.

So a partition key, say ABCDEFGH, could be sub partitioned 0ABCDEFGH to 9ABCDEFGH.
Writes are made to a partition, with the prefix digit generated either randomly or in round robin fashion. Reads would query across all 10 partitions in parallel and merge the results.

For striping across tables, one of N tables can be written to randomly or in round robin fashion and queried similarly in parallel.

1
votes

Edit: I had originally stated that the limit was 500 transaction/partition/sec. That was incorrect. The limit is actually 500 entities/partition/sec, as stated in the original question.

This also applies to the query speeds you've calculated. If you query an ATS PartitionKey and it returns 1000 entities, that will likely take only a little longer, perhaps a few hundred milliseconds, than returning a single entity. On the other hand, if the query returns more than 1000 entities it will be much slower, as each set of 1000 rows requires an essentially independent transaction and must be done in serial.

It's not completely clear to me what you're doing, but it sounds like a lot of querying. Keep in mind that querying ATS on non-key columns tends to be very slow. If you're doing a lot of that, you might be better served by using SQL Azure Federations and fan-out queries instead.