What is the disadvantage to unique partition keys?

Question

My data set will only ever be directly queried (meaning I am looking up a specific item by some identifier) or will be queried in full (meaning return every item in the table). Given that, is there any reason to not use a unique partition key?

From what I have read (e.g.: https://azure.microsoft.com/en-us/documentation/articles/storage-table-design-guide/#choosing-an-appropriate-partitionkey) the advantage of a non-unique partition key is being able to do transactional updates. I don't need transactional updates in this data set so is there any reason to partition by anything other than some unique thing (e.g., GUID)?

Assuming I go with a unique partition key per item, this means that each partition will have one row in it. Should I repeat the partition key in the row key or should I just have an empty string for a row key? Is a null row key allowed?

A null row key is not allowed. It sounds like you want to keep all data within the same partition, where each unique row within the partition will have a unique row key (it will have to). Probably difficult to determine without knowing a little more about your data, but if you are returning every row, then having them spread across partitions could impact performance. — Brendan Green
@BrendanGreen I thought spreading them out would improve performance because each partition could be reading data in parallel? — Micah Zoltu
@BrendanGreen a null or an empty RowKey is indeed allowed. I'm using it a lot in my application. — Gaurav Mantri
Ok - I think I may have interpreted the requirement that a rowkey must be present for insert, update and delete operations as also being non-null. — Brendan Green

Emily Gerner Emily Gerner · Accepted Answer · 2015-09-11T21:11:21

Zhaoxing's answer is essentially correct but I want to expand on it so you can understand a bit more why.

A table partition is defined as the table name plus the partition key. A single server can have many partitions, but a partition can only ever be on one server.

This fundamental design means that access to entities stored in a single partition cannot be load-balanced because partitions support atomic batch transactions. For this reason, the scalability target for an individual table partition is lower than for the table service as a whole. Spreading entities across many partitions allows Azure storage to scale your load much better.

Point queries are optimal which is great because it sounds like that's what you will be doing a lot of. If partition key has no logical meaning (ie, you won't want all the entities in a particular partition) you're best splitting out to many partition keys. Listing all entities in a table will always be slower because it's a scan. Azure storage will return continuation tokens if we hit timeout, 1000 entities, or a server boundary (as discussed above). Many of the storage client libraries have convenience methods which should help you handle this by automatically following these tokens as you iterate through the list.

TL;DR: With the information you've given I'd recommend a unique partition key per item. Null row keys are not allowed, but however else you'd like to construct the row key is fine.

Reading:

Azure Storage Table Design Guide

Azure Storage Performance Check List

What is the disadvantage to unique partition keys?

3 Answers