7
votes

I am to determine a good strategy for storing logging information in Azure Table Storage. I have the following:

PartitionKey: The name of the log.

RowKey: Inversed DateTime ticks,

The only issue here is that partitions could get very large (millions of entities) and the size will increase with time.

But that being said, the type of queries being performed will always include the PartitionKey (no scanning) AND a RowKey filter (a minor scan).

For example (in a natural language):

where `PartitionKey` = "MyApiLogs" and
where `RowKey` is between "01-01-15 12:00" and "01-01-15 13:00"

Provided that the query is done on both PartitionKey and RowKey, I understand that the size of the partition doesn't matter.

2
With this design, you are still doing scan even though in a partition. How about creating a separate table for each log type?Gaurav Mantri
@GauravMantri: Do you mean separate partitions for each log type? Or entirely separate tables?Dave New
I meant separate tables.Gaurav Mantri
this question has very rich answers stackoverflow.com/questions/29842478/…Korayem

2 Answers

10
votes

Take a look at our new Table Design Patterns Guide - specifically the log-data anti-pattern as it talks about this scenario and alternatives. Often when people write log files they use a date for the PK which results in a partition being hot as all writes go to a single partition. Quite often Blobs end up being a better destination for log data - as people typically end up processing the logs in batches anyway - the guide talks about this as an option.

0
votes

Adding my own answer so people can have something inline without needing external links.

You want the partition key to be the timestamp plus the hash code of the message. This is good enough in most cases. You can add to the hash code of the message the hash code(s) of any additional key/value pairs as well if you want, but I've found it's not really necessary.

Example:

string partitionKey = DateTime.UtcNow.ToString("o").Trim('Z', '0') + "_" + ((uint)message.GetHashCode()).ToString("X");
string rowKey = logLevel.ToString();
DynamicTableEntity entity = new DynamicTableEntity { PartitionKey = partitionKey, RowKey = rowKey };
// add any additional key/value pairs from the log call to the entity, i.e. entity["key"] = value;
// use InsertOrMerge to add the entity

When querying logs, you can use a query with partition key that is the start of when you want to retrieve logs, usually something like 1 minute or 1 hour from the current date/time. You can then page backwards another minute or hour with a different date/time stamp. This avoids the weird date/time hack that suggests subtracting the date/time stamp from DateTime.MaxValue.

If you get extra fancy and put a search service on top of the Azure table storage, then you can lookup key/value pairs quickly.

This will be much cheaper than application insights if you are using Azure functions, which I would suggest disabling. If you need multiple log names just add another table.