1
votes

My data set will only ever be directly queried (meaning I am looking up a specific item by some identifier) or will be queried in full (meaning return every item in the table). Given that, is there any reason to not use a unique partition key?

From what I have read (e.g.: https://azure.microsoft.com/en-us/documentation/articles/storage-table-design-guide/#choosing-an-appropriate-partitionkey) the advantage of a non-unique partition key is being able to do transactional updates. I don't need transactional updates in this data set so is there any reason to partition by anything other than some unique thing (e.g., GUID)?

Assuming I go with a unique partition key per item, this means that each partition will have one row in it. Should I repeat the partition key in the row key or should I just have an empty string for a row key? Is a null row key allowed?

3
A null row key is not allowed. It sounds like you want to keep all data within the same partition, where each unique row within the partition will have a unique row key (it will have to). Probably difficult to determine without knowing a little more about your data, but if you are returning every row, then having them spread across partitions could impact performance. - Brendan Green
@BrendanGreen I thought spreading them out would improve performance because each partition could be reading data in parallel? - Micah Zoltu
@BrendanGreen a null or an empty RowKey is indeed allowed. I'm using it a lot in my application. - Gaurav Mantri
Ok - I think I may have interpreted the requirement that a rowkey must be present for insert, update and delete operations as also being non-null. - Brendan Green

3 Answers

3
votes

Zhaoxing's answer is essentially correct but I want to expand on it so you can understand a bit more why.

A table partition is defined as the table name plus the partition key. A single server can have many partitions, but a partition can only ever be on one server.

This fundamental design means that access to entities stored in a single partition cannot be load-balanced because partitions support atomic batch transactions. For this reason, the scalability target for an individual table partition is lower than for the table service as a whole. Spreading entities across many partitions allows Azure storage to scale your load much better.

Point queries are optimal which is great because it sounds like that's what you will be doing a lot of. If partition key has no logical meaning (ie, you won't want all the entities in a particular partition) you're best splitting out to many partition keys. Listing all entities in a table will always be slower because it's a scan. Azure storage will return continuation tokens if we hit timeout, 1000 entities, or a server boundary (as discussed above). Many of the storage client libraries have convenience methods which should help you handle this by automatically following these tokens as you iterate through the list.

TL;DR: With the information you've given I'd recommend a unique partition key per item. Null row keys are not allowed, but however else you'd like to construct the row key is fine.

Reading:

Azure Storage Table Design Guide

Azure Storage Performance Check List

1
votes

If you don't need EntityGroupTransaction to update entities in batch, unique partition keys are good option to you.

0
votes

Table service auto-scale feature may not work perfectly I think. When some of data in a partition are 'hot', table service will move them to another cluster to enhance performance. But since you have unique partition key, probably non of your entity will be determined as 'hot', while if you grouped them in partitions some partition will be 'hot' and moved. This problem below may also be there if you are using static partition key.

Besides, table service may returns partial entities of your query when

  1. More than 1000 entities in result.
  2. Partition boundary is crossed.

From your request you also need full query (return all entities). If your are using unique partition key this mean each entity is a unique partition, so your query will only return 1 entity with a continue token. And you need to fire another query with this continue token to retrieve the next entity. I don't think this is what you want.

So my suggestion is, select a reasonable partition key in any cases, even though it looks useless in your business, because it helps table service to optimize your data.