We are storing our log entries into Azure table storage. We are storing the following properties:
- date (DateTime)
- tenantId (long)
- username (string)
- severity (int - enumeration of Error (0), Warning (1), Info (2), Debug(3))
- ... other properties which are not important right now
There's a huge amount of log entries and we would like to query the logs fast according to following columns:
- date (from/to)
- tenantId (equals)
- severity (from/to)
- username (equals)
So we designed our rowKey to be date + guid to guarantee uniqueness. And now we need to design PartitionKey. We need to be able to query according to tenantId, username and severity. So we need to get them to PartitionKey I suppose.
Following format: {severity}-{tenantId}-{username} seems to be a candidate (example of a key is [email protected] which means severity: Warning, tenantId: 500, user: [email protected]). For instance I would like to query for:
- Debug and Info severity
that's fine because I can write PartitionKey >= 2 AND PartitionKey < 4
, - it will reduce all results of other severity.
But if I need to enhance this query for tenantId = 500 (so Debug and Info for tenantId 500 only), I need to write PartitionKey >= 2-500 AND PartitionKey < 2-501 AND PartitionKey >= 3-500 AND PartitionKey < 3-501
So it seems possible, but query gets more complicated. I know I can only benchmark to test its performance, but the question is whether is there any better design of RowKey and PartitionKey to achieve better performance.