3
votes

I want to have a friendlier facing ids (ie Youtube style: /posts/cxB6Ey6) than MongoDB's ObjectID.

I read that for scalability its best to leave _id as an ObjectID so I thought about two solutions:

1) add an indexed postid field to each document

2) create a mapping collection between _id and the postid

in both cases use something like https://github.com/dylang/shortid to generate the short id, and while generating make sure that the id is unique by querying the database. (can this query-generate-insert be an atomic operation?)

will those solutions have a noticeable impact on performance ?

what's the best strategy for doing this ?

3
I don't think anyone went and read the shortid code suggested in the first post (github.com/dylang/shortid) this is a unique identifier provided you manage the host identifier on scaling. I will defer to the experts on not messing with the original ObjectID and go with the answer from Sammaye that you just put it into a new field (e.g. PostID) that you index. - Mikey Mr.H

3 Answers

5
votes

The normal method of doing this is to base64 encode a unique id but:

add an indexed postid field to each document

You definitely want to go for this method. Out of the two I would say this method is easily the most scalable and performant, for one it would only need one round trip to get a short URLs details where as the second option would take 2. Another consideration is the shortage of index overhead of maintaining an extra collection, this is a bit of a no-brainer.

I would not replace the _id field within the document either since the default ObjectId could still be useful in the foreseeable future.

So this limits it down to a separate field and index (unique key) for the short code of a URL.

The next thing is that you don't want an ID which forces you to query the database for uniqueness prior to every insert. This is where the ObjectId shines. The ObjectId is good at being made within the client application while being unique in the database without having to specifically query those assumptions.

Unique ids that do not require querying the database first are normally time based. In PHP ( http://php.net/manual/en/function.uniqid.php ) and in the MongoDB Drivers ( http://docs.mongodb.org/manual/core/object-id/ ) and even the plug-in you linked on github ( https://github.com/dylang/shortid/blob/master/lib/shortid.js#L50 ) they all use time as a basis for being unique.

Considering the plug-in you linked does not query the database to check its own IDs uniqueness I would say that this plug-in probably is quite performant and if you use it with the first solution you stated you should get a good benchmark out of it.

3
votes

If you want to replace build-in ObjectID with custom user-friendly short id's then do it. You can either use build-in _id field or add a new unique-indexed field id for your custom ids. The benefit of using build-in ObjectID's is than they won't duplicate even if your database is extremely large. So, by replacing them with short id's you take the risk of id duplication.

Now about the performance. I think that the best solution is not to query DB for id's, because with properly adjusted ids length the probability of duplication is extremely small. So, the best way to handle ids duplication in this model is to check Mongo responses. If it responded with "duplicate key error" then you shall generate a new one.

And now about scaling. To scale your custom ids you can just add a few more symbols to it. "Duplicate key error" shall be a trigger for making that change. Normally there shall be no such errors. So, if they started to appear then its time to scale.

1
votes

I don't think that generating ObjectId for _id field affect directly scalability or performance. Whereby this can be happen?

Main difference is that ObjectIds are created by MongoDB and you don't burden yourself with responsibility for this. Otherwise you must by yourself to determine optimal size of id and to ensure unique value for each _id field of documents stored in collection. It's required because _id used as primary key. This can be justified if you have not very big collection and custom value of identifier is need for you.

But you have such additional benefits with _id field that stores ObjectId values as opportunity to create object id's from time and use this fact to your advantage in queries. Also you can get timestamp of ObjectId’s creation with getTimestamp() method. And sorting on _id in this case is equivalent to sorting by creation time.

But if you're going to use ObjectId in URLs or HTML then for security concerns you can encrypt it. To prevent leakage of information and access to object's creation time. It may be security risk.

About your solutions:

1) I suppose this's very convenient and flexible solution. In this case you can specify any value in postId which doesn't depend directly on _id.

But little disadvantage of this solution is that you have to have extra field and to create extra index. While _id is automatically indexed.

2) I don't think this's good solution from the point of view of performance and philosophy of noSQL approach.