1
votes

GET /Product()?$filter=((PartitionKey%20eq%20'lIkfA81JpTmv')%20and%20(RowKey%20eq%20'')) %20or%20((PartitionKey%20eq%20'lIGcEmrr7hWz')%20and%20(RowKey%20eq%20'')) %20or%20((PartitionKey%20eq%20'lIAoy6PqeMVn')%20and%20(RowKey%20eq%20'')) %20or%20((PartitionKey%20eq%20'lIjETAtuhYGM')%20and%20(RowKey%20eq%20'')) %20or%20((PartitionKey%20eq%20'lIHa0znP5qAk')%20and%20(RowKey%20eq%20'')) %20or%20((PartitionKey%20eq%20'lIOCaSXg9YE7')%20and%20(RowKey%20eq%20'')) %20or%20((PartitionKey%20eq%20'lInRozGrMa7T')%20and%20(RowKey%20eq%20'')) %20or%20((PartitionKey%20eq%20'lILEwwPPcBfe')%20and%20(RowKey%20eq%20'')) %20or%20((PartitionKey%20eq%20'lJ14qZv1KFn4')%20and%20(RowKey%20eq%20''))% 20or%20((PartitionKey%20eq%20'lIIohzupFLcV')%20and%20(RowKey%20eq%20'')).....

Very standard Query to Azure Table Storage for a List(50) of Known PartitionKey and RowKey. This will take 5 seconds for first bite from Server. Is there anyway to speed things up?

2

2 Answers

2
votes

"Or" queries are not optimized the way you might expect them to be. A query like this results in a full table scan. As Gaurav suggested, you really need to do these as separate queries (in parallel) to get fast response times.

I also thoroughly disagree with Astaykov's statement that you shouldn't bother optimizing because your performance is within the SLA. The performance isn't random, and an SLA is generally an upper bound. Do take the time to optimize your performance-sensitive queries. You should easily be able to do this sort of lookup consistently in sub-second time.

EDIT:

Not sure which language you're working in, but here's a quick Node.js test that seems to usually take between 1 and 1.2 seconds from my house, but occasionally closer to 1.5:

function timeParallelQueries(account, key) {
    var azure = require('azure'),
        Q = require('q'),
        _ = require('underscore');

    var tables = azure.createTableService(account, key);

    function convertToString(n) { return n + ''; }

    var start = null;

    Q.ncall(tables.createTableIfNotExists, tables, 'test')
    .then(function () {
        return Q.all(_.map(_.map(_.range(50), convertToString), function(key) {
            return Q.ncall(tables.insertOrReplaceEntity, tables, 'test', {PartitionKey: key, RowKey: key});
        }));
    })
    .then(function () {
        start = new Date();
        return Q.all(_.map(_.map(_.range(50), convertToString), function (key) {
            return Q.ncall(tables.queryEntity, tables, 'test', key, key);
        }));
    })
    .then(console.log)
    .then(function (results) {
        console.log('Took ' + (new Date() - start) + 'ms.');
    });
}
1
votes

Beside that you are quering for "known" PK & RK, you are providing a lot of them with OR. Having in mind that the chance different partitions to be spread across different physical servers is pretty high, I am not suprised from the results.

Also according to the Storage SLA, a table operation:

Must complete processing or return a continuation within 10 seconds

While an operation on a single entity (that is single pair PK & RK):

Must complete processing within 2 seconds

So 5 seconds is something average and within the SLA. Even you speed up your query somehow, it will not be relible, in terms that the SLA for your query is "within 10 seconds". So all and any efforts you put in optimizing your query may be wasted since this is a variable time depending on lots of factors. And what you achieve today for 3 seconds result, may produce 8 seconds tomorrow and still within the SLA.

I would not go and dig deeper in something that is within SLA.

UPDATE 1

There are number of other ways to minize page load time. You can start thinking Asynchronously! Blow super clean data-empty HTML to the client and load all your data on demand via ajax after page load.

Also think about caching. You can cache (almost) any type of data that is to be presented to the user. With the trade-off "data accuracy" vs "speed loading". So you can cache, or even pre-cache some data that is to be loaded. I think this would be a choice for your scenario, since you know the PK & RK you are looking for - cache that entries and serve them from the cache instead of going to the Table on every request. You can set either absolute expiration or sliding expiration depending on how likely is your data to be changed.

UPDATE 2

As Gaurav mentioned - you can try parallel quering the tables, and put the results into the cache. However the parralelism degree depends on the number of cores you are operating on. So there is no point in doing parralel queries if you are on single core. Still, consider caching and client data binding!