0
votes

According to docs, the property id is special in Azure CosmosDB documents as it must always be set and have unique value per partition. Also it has additional restrictions on its content :

The following characters are restricted and cannot be used in the Id property: '/', '\', '?', '#'

Obviously, this field is one of document "keys" (in addition to _rid) and used somehow in internal plumbing. Other than the restrictions above, it is unclear how exactly is this key used internally and more importantly for practitioners,which values constitute technically better ids than others?

Wild guess 1: For example, from some DB worlds, one would prefer short primary key values, since the PK would be included in index entries and shorter keys would allow more compact index for storage and lookup. Would id field length matter at all besides the one-time storage cost?

Wild guess 2: in some systems better throughput is achieved if common prefixes are avoided in names (i.e. azure storage container/blob names) and even suggest to add a small random hash as prefix. Does cosmosDB care about id prefix similarities?

Anything else one should consider?

EDIT: Clarification, I'm interested in what's good for the cosmosDB server storage/execution side, provided my data model is still in design and/or has multiple keys available the data designer can choose from.

1

1 Answers

1
votes

First and foremost let's clear something out. The id property is NOT unique. Your collection can have multiple documents that have the exact same id. The id is ONLY unique within it's own logical partition.

That said, based on all the compiled info that we know from documentation and talks it doesn't really matter what value you choose to go with. It is a string and Cosmos DB will treat it as such but it is also considered as a "Primary key" internally so restrictions apply, such as ordering by it.

Where it does matter is in your consuming application's business logic. The id plays a double role of being both a CosmosDB property but also your property. You get to set it. This is the value you are going to use to make direct reads to the database. If you use any other value, then it's no longer a read. It's a query. That makes it more expensive and slower.

A good value to set is the id of the entity that is hosted in this collection. That way you can use the entity's id to read quickly and efficiently.