0
votes

"events" is a capped collection that stores user click events on a webpage. A document looks like this:

{
    "event_name" : "click",
    "user_id" : "ea0b4027-05f7-4902-b133-ff810b5800e1",
    "object_type" : "ad",
    "object_id" : "ea0b4027-05f7-4902-b133-ff810b5822e5",
    "object_properties" : { "foo" : "bar" },
    "event_properties" : {"foo" : "bar" },
    "time" : ISODate("2014-05-31T22:00:43.681Z")
}

Here's a compound index for this collection:

db.events.ensureIndex({object_type: 1, time: 1});

This is how I am querying:

db.events.find( {
   $or : [ {object_type : 'ad'}, {object_type : 'element'} ],
   time: { $gte: new Date("2013-10-01T00:00:00.000Z"), $lte: new Date("2014-09-01T00:00:00.000Z") }}, 
  { user_id: 1, event_name: 1, object_id: 1,  object_type : 1,  obj_properties : 1, time:1 } )
.sort({time: 1});

This is causing: "too much data for sort() with no index. add an index or specify a smaller limit" in mongo 2.4.9 and "Overflow sort stage buffered data usage of 33554618 bytes exceeds internal limit of 33554432 bytes" in Mongo 2.6.3. I'm using Java MongoDB driver 2.12.3. It throws the same error when I use "$natural" sorting. It seems like MongoDB is not really using the index defined for sorting, but I can't figure out why (I read MongoDB documentation on indexes). I appreciate any hints.

Here is the result of explain():

{
    "clauses" : [
        {
            "cursor" : "BtreeCursor object_type_1_time_1",
            "isMultiKey" : false,
            "n" : 0,
            "nscannedObjects" : 0,
            "nscanned" : 0,
            "scanAndOrder" : false,
            "indexOnly" : false,
            "nChunkSkips" : 0,
            "indexBounds" : {
                "object_type" : [
                    [
                        "element",
                        "element"
                    ]
                ],
                "time" : [
                    [
                        {
                            "$minElement" : 1
                        },
                        {
                            "$maxElement" : 1
                        }
                    ]
                ]
            }
        },
        {
            "cursor" : "BtreeCursor object_type_1_time_1",
            "isMultiKey" : false,
            "n" : 399609,
            "nscannedObjects" : 399609,
            "nscanned" : 399609,
            "scanAndOrder" : false,
            "indexOnly" : false,
            "nChunkSkips" : 0,
            "indexBounds" : {
                "object_type" : [
                    [
                        "ad",
                        "ad"
                    ]
                ],
                "time" : [
                    [
                        {
                            "$minElement" : 1
                        },
                        {
                            "$maxElement" : 1
                        }
                    ]
                ]
            }
        },
    "cursor" : "QueryOptimizerCursor",
    "n" : 408440,
    "nscannedObjects" : 409686,
    "nscanned" : 409686,
    "nscannedObjectsAllPlans" : 409686,
    "nscannedAllPlans" : 409686,
    "scanAndOrder" : false,
    "nYields" : 6402,
    "nChunkSkips" : 0,
    "millis" : 2633,
    "server" : "MacBook-Pro.local:27017",
    "filterSet" : false
}
1
Have you tried adding an index for time solely, e.g. not a compound index?TeTeT
Yes I did create an index for each, "time" and "object_type", same problemksiomelo

1 Answers

1
votes

According to explain(), When the mongo run the query it did use the compound index. The problem is the sort({time:1}). Your index is {object_type:1, time:1}, it means the query results are ordered by object_type first, if the object_type is same, then ordered by time.

For the sort {time:1}, mongo have to load all the matched objects(399609) into the memory to sort by time due to the order is not the same to the index({object_type:1, time:1}). Assume that the avg size of object is 100 bytes, then the limit would be exceeded.

more info: http://docs.mongodb.org/manual/core/index-compound/

For instance, there are 3 objects with index {obj_type:1, time:1}:

{"obj_type": "a", "time" : ISODate("2014-01-31T22:00:43.681Z")}
{"obj_type": "c", "time" : ISODate("2014-02-31T22:00:43.681Z")}
{"obj_type": "b", "time" : ISODate("2014-03-31T22:00:43.681Z")}

db.events.find({}).sort({"obj_type":1, "time":1}).limit(2)

{"obj_type": "a", "time" : ISODate("2014-01-31T22:00:43.681Z")}
{"obj_type": "b", "time" : ISODate("2014-03-31T22:00:43.681Z")}

"nscanned" : 2  (This one use index order, which is sorted by {obj_type:1, time:1})

db.events.find({}).sort({"time":1}).limit(2)

{"obj_type": "a", "time" : ISODate("2014-01-31T22:00:43.681Z")}
{"obj_type": "c", "time" : ISODate("2014-02-31T22:00:43.681Z")}

"nscanned" : 3 (This one will load all the matched results and then sort)