0
votes

I'm attempting to query all data from the errorlog collection, and in the same query grab a count of relevant irs_documents for each errorlog entry.

The problem is that there are too many records in the irs_documents collection to perform a $lookup.

Is there a performant method of doing this in one MongoDB query?

Failed attempt

db.getCollection('errorlog').aggregate(
  [
    {
        $lookup: {
          from: "irs_documents",
          localField: "document.ssn",
          foreignField: "ssn",
          as: "irs_documents"
        }
    },
    {
        $group: {
            _id: { document: "$document", error: "$error" },
            logged_documents: { $sum : 1 }
        }
    }
  ]
)

Error

Total size of documents in $lookup exceeds maximum document size

Clearly this solution won't work. MongoDB is literally attempting to gather whole documents with $lookup, where I just want a count.

"errorlog" collection sample data:

/* 1 */
{
    "_id" : ObjectId("56d73955ce09a5a32399f022"),
    "document" : {
        "ssn" : 1
    },
    "error" : "Error 1"
}

/* 2 */
{
    "_id" : ObjectId("56d73967ce09a5a32399f023"),
    "document" : {
        "ssn" : 2
    },
    "error" : "Error 1"
}

/* 3 */
{
    "_id" : ObjectId("56d73979ce09a5a32399f024"),
    "document" : {
        "ssn" : 3
    },
    "error" : "Error 429"
}

/* 4 */
{
    "_id" : ObjectId("56d73985ce09a5a32399f025"),
    "document" : {
        "ssn" : 9
    },
    "error" : "Error 1"
}

/* 5 */
{
    "_id" : ObjectId("56d73990ce09a5a32399f026"),
    "document" : {
        "ssn" : 1
    },
    "error" : "Error 8"
}

"irs_documents" collection sample data

/* 1 */
{
    "_id" : ObjectId("56d73905ce09a5a32399f01e"),
    "ssn" : 1,
    "name" : "Sally"
}

/* 2 */
{
    "_id" : ObjectId("56d7390fce09a5a32399f01f"),
    "ssn" : 2,
    "name" : "Bob"
}

/* 3 */
{
    "_id" : ObjectId("56d7391ace09a5a32399f020"),
    "ssn" : 3,
    "name" : "Kelly"
}

/* 4 */
{
    "_id" : ObjectId("56d7393ace09a5a32399f021"),
    "ssn" : 9,
    "name" : "Pippinpaddle-Oppsokopolis"
}
1

1 Answers

1
votes

The error is self explanatory. Lookup is essentially combining two documents into single BSON document so MongoDB document size limit is biting you back.

You need to ask yourself, is it absolute necessary to perform both actions in one operation? if yes, do it the way you have to do in previous versions of MongoDB where $lookup is not supported.

Said that, perform two queries and perform merger in your client.

OPTION #1: you can aggregate on irs_documents and export computed result into another collection. Since, there will be very few objects in each document, I don't think you'll hit problem. But, you may hit memory problems and forced to use disk for aggregation framework. Try following solution and see if it works.

db.irs_documents.aggregate([
{
  $group:{_id:"$ssn", count:{$sum:1}}  
},
{
  $out:"irs_documents_group"
}]);

db.errorlog.aggregate([
    {
        $lookup: {
          from: "irs_documents_group",
          localField: "document.ssn",
          foreignField: "ssn",
          as: "irs_documents"
        }
    },
    {
        $group: {
            _id: { document: "$document", error: "$error" },
            logged_documents: { $sum : 1 }
        }
    }
  ])

OPTION #2: If above solution is not working, you can always use map reduce, though it will not be an elegant solution but will work.