MongoDb - meaning of pipeline aggregation 100mb limit?

Question

New to Mongodb I am trying understand the 100mb limit MongoDb for aggregate pipelines. Trying to find out what this actually means? Does it apply to the size of the database collection we are performing the aggregate on?

Bit of background we have the following query on an inventory ledger where we are taking a data set, running a group sum to find out which products are still in-stock (ie amount sum is greater than 0). Based on the result where the product is in stock we return those records by running a lookup in the original collection. The query is provided below.

Assume the inventory objects contains about 10 sub fields/record pair. And assume for 1000records/1mb.

QUESTION My question is if the inventory collection size reaches 100mb as a JSON object array does this mean the call with fail? ie the max we can run the aggregate on is 100mb x 1000 records = 100,000 records?

BTW we are on a server that does not support writing to disk hence the question.

db.inventory.aggregate([
  {
    $group: {
      _id: {
        "group_id": "$product"
      },
      "quantity": {
        $sum: "$quantity"
      }
    }
  },
  {
    "$match": {
      "quantity": {
        $gt: 0
      }
    }
  },
  {
    $lookup: {
      from: "inventory",
      localField: "_id.group_id",
      foreignField: "$product",
      as: "records"
    }
  }
])

What do you mean by "server that does not support writing to disk"? The temp files used by {allowDiskUse : true} option are written to <dbPath>/_tmp directory. Without having write access to it, the MongoDB service does not start at all. — Wernfried Domscheit
Using mongodb atlas M5 shared clusters. It’s not supported unfortunately — Ka Tech

Joe Joe · Accepted Answer · 2021-03-18T15:49:17

The 100MB limit is a restriction on the amount of memory used by an aggregation stage.

The pipeline in your question first needs to read every document from the collection. It does this by requesting the documents from the storage engine, which will read each document from the disk and store it in the in-memory cache. The cache does not count against the 100MB limit.

The aggregation process will receive documents individually from the storage engine, and pass it through the pipeline to the first blocking stage (group is a blocking stage).

The group stage will examine the input document, update the fields in matching group, and then discard the input document.

This means the memory required by the group stage will be the sum of:

the size of 1-2 documents
total storage size for each result group
any scratch space needed for the operations to build each result

The specific group stage in the question is return a product identifier and an integer.

Using the Object.bsonsize funtion in the mongo shell, we can see that a null product ID produces a 43-byte object:

> Object.bsonsize({_id:{group_id:null},quantity:0})
43

So the total memory required will be

<number of distinct `product` values> x (<size of a product value> + 43)

Note that the values will be stored in BSON, so a string will be length+5, a UUID would be 21 bytes, etc.

MongoDb - meaning of pipeline aggregation 100mb limit?

1 Answers