86
votes

Using the code:

all_reviews = db_handle.find().sort('reviewDate', pymongo.ASCENDING)
print all_reviews.count()

print all_reviews[0]
print all_reviews[2000000]

The count prints 2043484, and it prints all_reviews[0].

However when printing all_reviews[2000000], I get the error:

pymongo.errors.OperationFailure: database error: Runner error: Overflow sort stage buffered data usage of 33554495 bytes exceeds internal limit of 33554432 bytes

How do I handle this?

6

6 Answers

120
votes

You're running into the 32MB limit on an in-memory sort:

https://docs.mongodb.com/manual/reference/limits/#Sort-Operations

Add an index to the sort field. That allows MongoDB to stream documents to you in sorted order, rather than attempting to load them all into memory on the server and sort them in memory before sending them to the client.

46
votes

As said by kumar_harsh in the comments section, i would like to add another point.

You can view the current buffer usage using the below command over the admin database:

> use admin
switched to db admin
> db.runCommand( { getParameter : 1, "internalQueryExecMaxBlockingSortBytes" : 1 } )
{ "internalQueryExecMaxBlockingSortBytes" : 33554432, "ok" : 1 }

It has a default value of 32 MB(33554432 bytes).In this case you're running short of buffer data so you can increase buffer limit with your own defined optimal value, example 50 MB as below:

>  db.adminCommand({setParameter: 1, internalQueryExecMaxBlockingSortBytes:50151432})
{ "was" : 33554432, "ok" : 1 }

We can also set this limit permanently by the below parameter in the mongodb config file:

setParameter=internalQueryExecMaxBlockingSortBytes=309715200

Hope this helps !!!

Note:This commands supports only after version 3.0 +

25
votes

solved with indexing

db_handle.ensure_index([("reviewDate", pymongo.ASCENDING)])
15
votes

If you want to avoid creating an index (e.g. you just want a quick-and-dirty check to explore the data), you can use aggregation with disk usage:

all_reviews = db_handle.aggregate([{$sort: {'reviewDate': 1}}], {allowDiskUse: true})

(Not sure how to do this in pymongo, though).

3
votes

JavaScript API syntax for the index:

db_handle.ensureIndex({executedDate: 1})
2
votes

In my case, it was necessary to fix nessary indexes in code and recreate them:

rake db:mongoid:create_indexes RAILS_ENV=production

As the memory overflow does not occur when there is a needed index of field.

PS Before this I had to disable the errors when creating long indexes:

# mongo
MongoDB shell version: 2.6.12
connecting to: test
> db.getSiblingDB('admin').runCommand( { setParameter: 1, failIndexKeyTooLong: false } )

Also may be needed reIndex:

# mongo
MongoDB shell version: 2.6.12
connecting to: test
> use your_db
switched to db your_db
> db.getCollectionNames().forEach( function(collection){ db[collection].reIndex() } )