Azure table storage - partitionkey design for fast queries

Question

We are storing our log entries into Azure table storage. We are storing the following properties:

date (DateTime)
tenantId (long)
username (string)
severity (int - enumeration of Error (0), Warning (1), Info (2), Debug(3))
... other properties which are not important right now

There's a huge amount of log entries and we would like to query the logs fast according to following columns:

date (from/to)
tenantId (equals)
severity (from/to)
username (equals)

So we designed our rowKey to be date + guid to guarantee uniqueness. And now we need to design PartitionKey. We need to be able to query according to tenantId, username and severity. So we need to get them to PartitionKey I suppose.

Following format: {severity}-{tenantId}-{username} seems to be a candidate (example of a key is [email protected] which means severity: Warning, tenantId: 500, user: [email protected]). For instance I would like to query for:

Debug and Info severity

that's fine because I can write PartitionKey >= 2 AND PartitionKey < 4, - it will reduce all results of other severity.

But if I need to enhance this query for tenantId = 500 (so Debug and Info for tenantId 500 only), I need to write PartitionKey >= 2-500 AND PartitionKey < 2-501 AND PartitionKey >= 3-500 AND PartitionKey < 3-501

So it seems possible, but query gets more complicated. I know I can only benchmark to test its performance, but the question is whether is there any better design of RowKey and PartitionKey to achieve better performance.

Gaurav Mantri Gaurav Mantri · Accepted Answer · 2017-07-31T15:17:21

Recommended approach would be to add multiple rows for the same data with different PartitionKeys. For example, if you have following data:

date: 2017-07-31

tenantId: 3

severity: 2

username: [email protected]

Here's how your data should be stored in the table:

PartitionKey    RowKey  date        tenantId    severity    user
----------------------------------------------------------------------
D|2017-07-31    GUID()  2017-07-31  500         2           [email protected]
T|500           GUID()  2017-07-31  500         2           [email protected]
S|2             GUID()  2017-07-31  500         2           [email protected]
U|[email protected]     GUID()  2017-07-31  500         2           [email protected]

So when you need to query on say tenant id and severity, you would do something like:

PartitionKey eq 'T|500' and severity eq 2

or

PartitionKey eq 'S|2' and tenant eq 500

Azure table storage - partitionkey design for fast queries

1 Answers