3
votes

When storing a one to many relationship in Azure Table Storage (such as storing the ID of the owner of a record) do you store both the PartitionKey and RowKey as two separate fields? Or do you somehow concatenate those two fields into one field for the purpose of simpler storage?


EDIT - more clarity on what I'm asking

I know table storage is not relational. And I'm not looking for foreign key integrity or cascading deletes or anything like that.

But even without being "relational" there is still a very common need to store a pointer to a record in another table. This has thousands of uses. My example of storing the user who created the record is just one example. Storing any sort of list or array is another example (such as a list of "Work Experience" records in a resume, or a list of contact addresses for a Person record.)

Because the "primary key" of an Azure Table Storage table is two fields, I'm wondering if there is a common convention on how to store that information. What is the "Best Practice?"

The options I see are:

  • Concatenate PartitionKey and RowKey into one field and store that as the "Foreign Key" (I know there are no real foreign keys).
  • Store both fields separately and use those as the foreign key.
  • A third option I haven't thought of.

Is there a "best practice" here? Is there a reason to choose one method over another?

3

3 Answers

4
votes

Here's a collection of "Best Practicies" working with Azure Tables: https://azure.microsoft.com/en-us/documentation/articles/storage-table-design-guide/

The "One-to-many relationships" chapter may be useful.

I'd suggest to denormalize your data and keep records that represent one-to-many relationship in single table. Denormalize considering how you're going to query the data.

3
votes

First thing is that Azure Table Storage is not Relational Database that natively supports One-To-Many relationships. Anyways you can develop this scenario as you mentioned. The key point is that PartitionKey + RowKey must be Unique in a Table. So in a given partition (represented by PartitionKey), you cannot duplicate a RowKey.

So I guess concatenation is the right way. Otherwise if you use PartitionKey and RowKey from main table as PartitionKey and RowKey in second table then you won't be able to have One-To-Many relationship. Because One record in main table will have Many records in second table and for all these records in second table you cannot have same PartitionKey and RowKey values.

Please note that concatenating PartitionKey and RowKey will result in separate partitions in second table for every record in main table. And if you have a scenario where you want to query multiple partitions in second table, then it will be slow.

The most important point is that when designing structure of Tables Storage the main consideration is Data Access Pattern. So you should see how will you access the data and design the structure accordingly.

Hope this helps.

2
votes

You have to be careful concatenating keys from parent tables. You really have to look at your keys as to whether they are viable.

We had a recent project where we stored relational data in ATS, it worked because our keys were either alphanumeric (A-Za-z0-9) (you can't have certain symbols in the PK/RK) or guids (with the dashes stripped out) split with an underscore. Have a look at the characters that you can use here. We could guarantee uniqueness of the keys with this model.

example

Customer: PK "business" RK "A0001"

Product: PK "business_A0001" RK "somesequentialstringguid"

InstanceOfProduct: PK "business_A0001_somesequentialstringguid" RK "somesequentialstringguid"

The other big problem is how do you wish to query your data? Gathering data from multiple products would be annoying with this model (multiple requests etc or have requests cross partition boundaries with continuation keys), but it's lightning fast to pull single instances/partitions. This was more important for us so that's what we went for.

my 2 cents HTH