Trade-offs when modelling one-to-many relationships with document references in MongoDB?

Question

My question arises from the following page in the MongoDB documentation: Model One-to-Many Relationships with Document References

The page presents two options to model one-to-many relationships with document references.

Option #1: Store an array of references in the "one" document. For example, the books field in the "publisher" document:

{
   _id: "oreilly",
   name: "O'Reilly Media",
   founded: 1980,
   location: "CA",
   books: [123456789, 234567890, ...]

}

{
    _id: 123456789,
    title: "MongoDB: The Definitive Guide",
    author: [ "Kristina Chodorow", "Mike Dirolf" ],
    published_date: ISODate("2010-09-24"),
    pages: 216,
    language: "English"
}

{
   _id: 234567890,
   title: "50 Tips and Tricks for MongoDB Developer",
   author: "Kristina Chodorow",
   published_date: ISODate("2011-05-06"),
   pages: 68,
   language: "English"
}

Option #2: Store a single reference in the "many" documents. For example, the publisher_id field in the "book" documents:

{
   _id: "oreilly",
   name: "O'Reilly Media",
   founded: 1980,
   location: "CA"
}

{
   _id: 123456789,
   title: "MongoDB: The Definitive Guide",
   author: [ "Kristina Chodorow", "Mike Dirolf" ],
   published_date: ISODate("2010-09-24"),
   pages: 216,
   language: "English",
   publisher_id: "oreilly"

}

{
   _id: 234567890,
   title: "50 Tips and Tricks for MongoDB Developer",
   author: "Kristina Chodorow",
   published_date: ISODate("2011-05-06"),
   pages: 68,
   language: "English",
   publisher_id: "oreilly"

}

As the MongoDB documentation points out, one advantage of option #2 is to avoid mutable, growing arrays. That makes a ton of sense, but I'd like to know if there are other advantages/disadvantages to either options generally.

But in particular, I'm curious to know if the "get all books for a publisher" query would be faster when using one of these two options (assuming you already retrieved the "publisher" document).

With option #1, the query would look like:

db.books.find( { _id : { $in : [123456789, 234567890, ...] } } );

With option #2, the query would look like:

db.books.find( { publisher_id : "oreilly" } );

The query for option #2 looks more simple, but is it faster?

[EDIT]: Ultimately, I want to know if there is a single advantage to option #1?

Amit Wagner Amit Wagner · Accepted Answer · 2019-08-04T15:50:16

in case of both options have there field indexed (if not it will do full scan either way) then option 2 will be faster only if you have a lot if ids in the array

the search for the first option will be O(m log n) where m is the ids array and n is the collection. in option 2 it will take only O(log n) as its need only to find the first document that had the publisher and then just start streaming the results

Trade-offs when modelling one-to-many relationships with document references in MongoDB?

1 Answers