1
votes

I want to shard a collection by their "foreign key" (userID) and not by their id field. I only need that the combination of userID and id is unique. But I am not sure if that is ok with mongodb.

Warning In any sharded collection where you are not sharding by the _id field, you must ensure uniqueness of the _id field. The best way to ensure _id is always unique is to use ObjectId, or another universally unique identifier (UUID.)

This is taken from: http://docs.mongodb.org/manual/tutorial/enforce-unique-keys-for-sharded-collections/#enforce-unique-keys-for-sharded-collections

Do I have to ensure that _id is unique? Or is it good enough if I always query by both userID and _id?

1

1 Answers

5
votes

Unless you manually replace them, the auto-generated _id's are UUID's which, according to the documentation, consist of "a 4-byte timestamp (seconds since epoch), a 3-byte machine id, a 2-byte process id, and a 3-byte counter".

As you can see, an unique machine-ID is part of the UUID. That ensures that no two machines in the shard ever create the same UUID independently (unless they have the same machine-id - the likeliness for that is 1:16777215 and when it happens it can be easily verified). The only situation where you could theoretically have a duplicated UUID is when a single process creates more than 2^24 (over 16 million) UUIDs in a single second.

tl;dr: You don't have to worry about duplicate UUIDs - they are, as the documentation puts it, "designed to have a reasonably high probability of being unique when allocated".