28
votes

I have a general database structure question. In my scenario I happen to be using mongodb.

I'm creating an application where a user can upload a list of songs (title, artist, etc.) but am not sure if I should have one songList collection for all users or a separate songList.user# collection for each individual user. The users can only ever query songs associated to them so user A will NEVER know about user B's songs.

Code Examples:

Multiple collections per user

db.songList.userA.find()
{"title": "Some song of user A", "artist": "Some artist of user A"}

db.songList.userB.find()
{"title": "Some song of user B", "artist": "Some artist of user B"}
  • Pros
    • Smaller collection size to query
  • Cons
    • Maintainability
      • 1,000 users means 1,000 collections

vs single collection with an owning 'user' field

db.songList.find({"user":"A"})
{"title": "Some song of user A", "artist": "Some artist of user A", "user": "A"}
  • Pros
    • Flexibility to query across users if need ever arised
  • Cons
    • Performance

I'm trying to build a pro/con list but still on the fence. Given that each user's songs are going to be isolated from each other which approach is better? My main concern is maintenance and query performance.

Thanks in advance.

2
Instead of worrying about things like this, build something. You'll probably find out what'll work best by building it, instead of worrying about details. - SomeKittens
Agree with @SomeKittens. That said, I'd do it per user because it's easier to make a mistake and show A's songs to B. Anyway, I'd worry about optimization if/when I had enough users. - Diego Basch
Security-wise, having one collection per user makes it possible to use Mongodb's collection-level access control mechanisms. This way, it can be ensured at the database level that one user never accesses another's data. - Boris van Schooten
@Steven I am curious to know which design you went with as I am also facing a similar dilemma at the moment. - MadPhysicist

2 Answers

13
votes

I would recommend NOT to make separate collection per user.

Read the documentation

By default MongoDB has a limit of approximately 24,000 namespaces per database. Each namespace is 628 bytes, the .ns file is 16MB by default.

Each collection counts as a namespace, as does each index. Thus if every collection had one index, we can create up to 12,000 collections. The --nssize parameter allows you to increase this limit (see below).

Be aware that there is a certain minimum overhead per collection -- a few KB. Further, any index will require at least 8KB of data space as the b-tree page size is 8KB. Certain operations can get slow if there are a lot of collections and the meta data gets paged out.

So you won't be able to gracefully handle it if your users exceed the namespace limit. Also it won't be high on performance with the growth of your userbase.

UPDATE

As @Henry Liu mentioned in the comments. For Mongodb 3.0 or above using WiredTiger storage engine, it will no longer be the limit.

docs.mongodb.org/manual/reference/limits/#namespaces

9
votes

MongoDB is great at scaling horizontally. It can shard a collection across a dynamic cluster to produce a fast, querable collection of your data.

So having a smaller collection size is not really a pro and I am not sure where this theory comes that it is, it isn't in SQL and it isn't in MongoDB. The performance of sharding, if done well, should be relative to the performance of querying a single small collection of data (with a small overhead). If it isn't then you have setup your sharding wrong.

MongoDB is not great at scaling vertically, as @Sushant quoted, the ns size of MongoDB would be a serious limitation here. One thing that quote does not mention is that index size and count also effect the ns size hence why it describes that:

Thus if every collection had one index, we can create up to 12,000 collections. The --nssize parameter allows you to increase this limit (see below).