Problem: How to shard collection by hashed index on custom _id field?
Problem description:
- I need to store pairs url => my_value in MongoDB
- Url must be unique
- I will execute a lot of queries to check, if i already have document with such url by matching {_id : md5(url_to_check)}
- Collection will be huge (billions of pairs url => my_value), so i want to shard it by url.
Solution, i consider:
Create collection with such fields:
- _id : md5(url)
- url : url
- value : my_value I don't create any index. _id is default indexed by mongo
Questions:
- I would like to shard collection by _id. Hashed shared key would be perfect, but do i have to create hashed shard key or can i just shard by regular _id key? I insert to _id already computed md5 by myself.
- What do you think about storing in _id not-hashed url and query by it? I would use less space (don't have to storedmd5(url)), but sharding will by on bigger text field and index will be on biger string (usualy url has more than 32 sings)
- What is it the best solution to solve such problem? Best means for me fast queries and use as less space for indexes, as it is required?