1
votes

I have set up my first mongodb sharded cluster and am finally at the stage where I create a db/collection and choose the shard key. I’ve read about how to choose an appropriate shard key and am likely going with a hashed index but I might be having some conceptual misunderstandings.

My documents are super simple and contain a document id (some natural number), a document version id (a natural number), and a string of the raw text itself. If I understand correctly from the documentation, I can choose to shard on the document id but this can lead to jumbo shards since the document id will be incremented and new documents will be added to the same shard. And so I could set the shard key as a hashed value of the document id.

My question is whether or not I can still continue to query by the document id? My brain is making me doubt this and making me think that the indexing of the documents is over the hashed shard key and not over the document id. I am hoping that the hashed shard key is used strictly for sharding and that I can set any key (i.e., document id) to be indexed. Is this correct?

1

1 Answers

0
votes

Yes, you can still query by the value of the shard key.

If you are referring to _id, that will be automatically indexed with it's natural value, otherwise you could explicitly create and index on the document id that is not hashed in addition to the shard key index.

As long as you test for equality to a single or explicit list of values, the query should be handled by the minimum number of shards.

However, if you use a ranged test such as $gte, the query will have to be forwarded to every shard to be processed.

Using the hashed document id as the shard key will result in the creation of an index for the hashed value in addition to any other indexes.

There is a pretty good description of hashed sharding in the documentation