Role of partition key in Cosmos DB Sql API Insert? With the Bulk Executor?

Question

I'm trying to repeatedly insert about 850 documents between 100 - 300Kb into a cosmos collection. I have them all in the same partition key.

The estimator suggests that at 50K RUs should handle this in short order but at well over 100k its averaging 20 minutes or so per set rather than something more reasonable.

Should I have unique partition keys for each document? Is the problem that having the all the documents going to the same partition key, they are being handled in series and the capacity isn't load leveling? Will using the bulk executor fix this?

No specific answer but... your RUs will be divided across physical partitions. So 50K RU is split across (I believe) 5 partitions to start. That said: Are you witnessing 429's (meaning, throttling because you're hitting the RU / sec threshold?) Also note: You can reduce write costs with a custom index plan that only indexes properties you will be searching on. By default, everything is indexed. — David Makogon
Turns out my issue was with my Index. Evaluating the default indexes for my collection was taking 100 to 1000x more RUs than actually writing the file. — A.Rowan
@A.Rowan Hi,Rowan. I summarized your solution in my answer. If you don't mind, you could mark it as answer for others' reference.Thanks a lot! — Jay Gong
@JayGong Definitely upvoting, but I was looking for confirmation on the impact of partitioning on inserting data, or better yet, an easy way to work with data that is already partitioned improperly for write but is optimized for read access. — A.Rowan

Jay Gong Jay Gong · Accepted Answer · 2018-09-03T09:09:15

Should I have unique partition keys for each document? Is the problem that having the all the documents going to the same partition key, they are being handled in series and the capacity isn't load leveling?

You could find below statement from this doc.

To fully utilize throughput provisioned for a container or a set of containers, you must choose a partition key that allows you to evenly distribute requests across all distinct partition key values.

So, I think defining partition key is good for insert or query.However, the choosing of partition key is really worth a dig.Please refer to this doc to choose your partition key.

Will using the bulk executor fix this?

Yes,you could use continuation token in bulk insert.More details ,please refer to my previous case:How do I get a continuation token for a bulk INSERT on Azure Cosmos DB?.

Hope it helps you.

Just for summary, we need to evaluate the default indexes for collection.It may take 100 to 1000x more RUs than actually writing the file.

Role of partition key in Cosmos DB Sql API Insert? With the Bulk Executor?

1 Answers