I'd like to evaluate how my Windows Azure Table store queries scale. For this purpose, I've put together a simple test environment, where I can increase the amount of data in my table, and measure the execution times of the queries. And based on the times I'd like to define a cost function that could be used to evaluate the performance of future queries.
I've evaluated the following queries:
- Query with PartitionKey and RowKey
- Query with PartitionKey and an attribute
- Query with PartitionKey and two RowKeys
- Query with PartitionKey and two attributes
For the last two queries I've checked the following two patterns:
- PartitionKey == "..." && (RowKey == "..." || RowKey == "...")
- (PartitionKey == "..." && RowKey == "...") || (PartitionKey == "..." && RowKey == "...")
To minimize the transfer delay, I've executed the test on an Azure instance. From the measurements, I can see that
- query 1 (not surprisingly, as the table is indexed based on those fields) is extremely fast, it's about 10-15ms if I have about 150000 entries in the table.
- query 2 requires a partition scan, so the execution time is increasing linearly with the stored data.
- query 3.1 performs almost exactly as query 2. So this query is also executed with a full partition scan, which for me seems a bit odd.
- query 4.1 is a bit more than two times slower than query 3.1. So it seems like it is evaluated with two partition scans.
- and finally, query 3.2 and 4.2 performs almost exactly 4 times slower than query 2.
Can you explain the internals of the query/filter interpreter? Even if we accept that query 3.1 needs a partition scan, query 4.1 could also be evaluated with the same logic (and under the same time). Query 3.2 and 4.2 seems like a mystery for me. Any pointers on those?
Obviously the whole point to this is that I'd like to query distinct elements within one query to minimize cost meanwhile not losing performance. But it seems like using separate queries (with Task Parallel Library) for each element is the only real fast solution. What is the accepted way of doing this?