Partition key for DocumentDB

Question

I have a question about DocumentDB partition key choise. I have data with UserId, DeviceId and WhateverId. UserId parameter will be in queries always, so I have chosen UserId as a partition key. But I have a lot of data for one user (millions of entities) and when I made a quety like "SELECT * FROM c WHERE c.DeviceId = @DeviceId" with partition key specified it takes a lot of time(about 6 minutes for about 220 000 returned entities). Maybe it would be more efficient to choose for example DeviceId as a partition key and make queries against a few partitions in parallel (specifying EnableCrossPartitionQuery = true and MaxDegreeOfParallelism = partition count)? Or maybe it is a good idea to use separate collection for every user?

Not that this answers your question, but... I think any time you're trying to retrieve a quarter-million entities, you might want to rethink your data access pattern. Also, "SELECT *" is yet another code-smell. I don't see how your choice of partition key is going to make a difference if you're trying to move that much data to your app tier. — David Makogon
Thanks. SELECT * was just for quick example, sorry. I'll use SELECT c.Value. And this question is just about choosing partition key, because information on azure documentation site is a little bit abstract as for me. All this measurements is just for performance comparison depending on query. — Paval

Larry Maccherone Larry Maccherone · Accepted Answer · 2016-11-03T10:42:01

It might help a little but I don't think a partition for each user will solve your problem because you essentially have that under the covers.

You could experiment with the partition key to improve the parrallism but, at best that would give you 2x to 5x improvement in my experience. Is that enough?

For more dramatic improvements you usually have to resort to selective denormalization and/or caching.

Partition key for DocumentDB

2 Answers