Here's the deal. Let's suppose we have the following data schema in MongoDB:
items
: a collection with large documents that hold some data (it's absolutely irrelevant what it actually is).item_groups
: a collection with documents that contain a list ofitems._id
calleditem_groups.items
plus some extra data.
So, these two are tied together with a Many-to-Many relationship. But there's one tricky thing: for a certain reason I cannot store items within item groups, so -- just as the title says -- embedding is not the answer.
The query I'm really worried about is intended to find some particular groups that contain some particular items (i.e. I've got a set of criteria for each collection). In fact it also has to say how much items within each found group fitted the criteria (no items means group is not found).
The only viable solution I came up with this far is to use a Map/Reduce approach with a dummy reduce function:
function map () {
// imagine that item_criteria came from the scope.
// it's a mongodb query object.
item_criteria._id = {$in: this.items};
var group_size = db.items.count(item_criteria);
// this group holds no relevant items, skip it
if (group_size == 0) return;
var key = this._id.str;
var value = {size: group_size, ...};
emit(key, value);
}
function reduce (key, values) {
// since the map function emits each group just once,
// values will always be a list with length=1
return values[0];
}
db.runCommand({
mapreduce: item_groups,
map: map,
reduce: reduce,
query: item_groups_criteria,
scope: {item_criteria: item_criteria},
});
The problem line is:
item_criteria._id = {$in: this.items};
What if this.items.length == 5000 or even more? My RDBMS background cries out loud:
SELECT ... FROM ... WHERE whatever_id IN (over 9000 comma-separated IDs)
is definitely not a good way to go.
Thank you sooo much for your time, guys!
I hope the best answer will be something like "you're stupid, stop thinking in RDBMS style, use $its_a_kind_of_magicSphere from the latest release of MongoDB" :)