2
votes

I am working with Azure CosmosDB, and more specifically with the Gremlin API, and I am a little bit stuck as to what to select as a partition key.

Indeed, since I'm using graph data, not all vertices follow the same data schema. If I select a property that not all vertices have in common, Azure won't let me store vertices which don't have a value for the partition key. The problem is, the only property they all have in common is /id, but Azure doesn't allow for this property to be used as a partition key.

Does that mean I need to create a property that all my vertices will have in common ? Doesn't that kill a little bit the purpose of graph data ? Or is there something I'm missing out ?

For example, in my case, I want to model an object and its parts. Each object and each part have a property /identificationNumber. Would it be better to use this property as a parition key, or to create a new property /partitionKey dedicated to the purpose of partitioning ? My concern is that, if I select /identificationNumber as the partition key, and if my data model has to evolve in the future, if I have to model new objects without an /identificationNumber, I will have to artificially add this property to these objects the data model, which might lead to some confusion.

1
By definition, all items in a partitioned database need to possess a partition key, hence that's inherently a common property, even if that's just the id or a copy/derivation of it. It would help to share actual examples of your data to get relevant advice on possible partition keys.Noah Stahl
I can't post the data schema for privacy reasons, but I will try to come up with a similar example.MarleneHE

1 Answers

1
votes

Creating a dedicated property to use as a synthetic partition key is a good practice if there isn't an obvious existing property to use. This approach can mitigate cases where you don't have an /identificationNumber in some objects, since you can assign some other value as the partitionKey in those cases. This also allows flexibility around refactoring /identificationNumber in the future, since partitionKey is what needs to be unchanging.

We shouldn't be concerned about an "artificial property" because this is inherent with using a partitioned database. It doesn't need to be exposed to users, but devs need to understand Cosmos is somewhat different than traditional DBs. It's also possible to migrate to a new partition key by copying all data to a new container, in the worst case of regret down the road. It's probably best to start working on the project with a best guess and seeing how things work, and perhaps iterating on different ideas to compare performance etc.