2
votes

I'm very new to Azure Tables, and i'm running into some performance problems. I have a query, that fetches thousands of rows using a partition key and a ranged rowkey.

PartitionKey = "Example123" and RowKey >= DateTime.Now.Ticks and RowKey < DateTime.Now.AddHours(1).Ticks.
The rowkey is a guid prefixed with a datetime.Ticks string.

This query takes 2-3 seconds to return 8000 entries. Is this reasonable?

Example entry:

A: "C6-85-08-07-06-98",
B: "C6-85-08-07-06-i1",
C: 123,
At: "2013-12-03T19:16:26.0799718Z",
PartitionKey: "example1",
RowKey: "635216949860799718_ca86be88-0995-4da8-90d6-351c615ec9ab",
Timestamp: "2013-12-03T19:16:36.5872058+00:00",
ETag: "W/"datetime'2013-12-03T19%3A16%3A36.5872058Z'""

Example Code
This is the code i'm using (SDK 2.1):

TableQuery<RawDataEntity> rangeQuery = new TableQuery<RawDataEntity>().Where(
            TableQuery.CombineFilters(
                TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.Equal, _partitionKey),
                TableOperators.And,
                IsWithin(from, to)
                ));

         // returns after ~3000 ms
         var result = _table.ExecuteQuery(rangeQuery).ToList();




// Helper method
public static string IsWithin(DateTime from, DateTime to)
    {
        return TableQuery.CombineFilters(
                TableQuery.GenerateFilterCondition("RowKey", QueryComparisons.GreaterThanOrEqual, from.Ticks.ToString()),
                TableOperators.And,
                TableQuery.GenerateFilterCondition("RowKey", QueryComparisons.LessThan, to.Ticks.ToString())

                );
    }

If there is nothing wrong with my query - what other ways are there to query a large table(easily over 10k rows) and return 10 000+ rows of data?

2

2 Answers

1
votes

Considering how the API works, 3 seconds is pretty fast. If you look at how the data is returned by the API using a tool like Fiddler, you will probably notice that your data is fetched over a few requests. The API uses paging to return your data.

I would recommend if possible that you query for subsets of your data using multiple parallel queries.

0
votes

You are correct that the api is internally doing multiple requests to return the 3k rows. Further, since you do a .ToList() you do not get any results until all of requests have completed and the results are cached in memory. An alternative would be to execute the query segmented via the ExecuteSegmented[Async] apis. These will allow you to get result pages earlier, and if you wish specify a maxresults count to lower the total number of records returned per request.

Additionally, I would recommend upgrading the to 3.0 client lib. It has support for JSON light / Nometadata which can reduce payloads by up to ~70% depending on the scenario. This will dramatically reduce IO and CPU usage required to parse these query results which should help improve the latencies of these queries.

Hope this helps.