8
votes

Should I use the allowDiskUse option when returned doc exceed 16MB limit in aggregation?

Or should I alter db structure or codes logic to avoid the limit? What's the advantage and disadvantage of 'allowDiskUse'? Thanks for your help.

Hers is the official doc I have seen: Result Size Restrictions

Changed in version 2.6.

Starting in MongoDB 2.6, the aggregate command can return a cursor or store the results in a collection. When returning a cursor or storing the results in a collection, each document in the result set is subject to the BSON Document Size limit, currently 16 megabytes; if any single document that exceeds the BSON Document Size limit, the command will produce an error. The limit only applies to the returned documents; during the pipeline processing, the documents may exceed this size.

Memory Restrictions¶

Changed in version 2.6.

Pipeline stages have a limit of 100 megabytes of RAM. If a stage exceeds this limit, MongoDB will produce an error. To allow for the handling of large datasets, use the allowDiskUse option to enable aggregation pipeline stages to write data to temporary files. https://docs.mongodb.com/manual/core/aggregation-pipeline-limits/

1

1 Answers

14
votes

allowDiskUse is unrelated to the 16MB result size limit. That setting controls whether pipeline steps such as $sort or $group can use some temporary disk space if they need more than 100MB of memory. In theory, for an arbitrary pipeline this could be a very large amount of diskspace. Personally it's never been a problem, but that will be down to your data.

If your result is going to be more than 16MB then you need to use the $out pipeline stage to output the data to a collection or use a pipeline API that returns a cursor to results instead of returning all the data inline (for some drivers this is a separate method, for others it is a flag passed to the same method).