2
votes

I am trying to do an aggregate where I'd like to relate items from one array to another.

The idea is, get sentences related with terms where sentence contains all words from term items. The output will be a project with all texts fields and one custom field with all attributes from terms.

Here is my first collection:

db.terms.insertMany([
  { "_id" : 1, "items" : ["sun", "day"] },
  { "_id" : 2, "items" : ["moon", "night"] },
])

And the second one:

db.texts.insertMany([
  { "_id" : 1, "sentence" : ["a beautiful sun makes a bright day", "not usefull here"] },
])

The intent aggregate:

db.texts.aggregate([
    {$lookup: {
        from: "terms",
        let: { term_items: "$items" },
        pipeline: [
          { $match: { $expr: { "$sentence": { $all: "$$term_items" } } } }
        ],
        as: "term_obj"
    }},
]);

When I execute this aggregate I am receiving this error:

org.graalvm.polyglot.PolyglotException: Command failed with error 168 (InvalidPipelineOperator): 'Unrecognized expression '$$term_items'' on server localhost:27019. The full response is {"ok": 0.0, "errmsg": "Unrecognized expression '$$term_items'", "code": 168, "codeName": "InvalidPipelineOperator"}

Another intent:

db.texts.aggregate([
    {$lookup: {
        from: "terms",
        let: { term_items: "$items" },
        pipeline: [
          { $match: { $expr: { $in: ["$$term_items", "$sentence"] } } }
        ],
        as: "term_obj"
    }},
]);

The error:

org.graalvm.polyglot.PolyglotException: Command failed with error 40081 (Location40081): '$in requires an array as a second argument, found: missing' on server localhost:27019. The full response is {"ok": 0.0, "errmsg": "$in requires an array as a second argument, found: missing", "code": 40081, "codeName": "Location40081"}

What I am missing here?

1
If "moon" appears in the first element of sentence, and "night" in the second, should it match overall, or do all of the items need to be in the same sentence element? - Joe

1 Answers

4
votes

In the existing lookup, you are using $items before it has a value. $let is where you should be assigning $sentence from the outer document to a variable.

One possible solution to accomplish this lookup:

  • $map over the sentence array
  • for each sentence, $reduce over the items array and test with $in
  • $reduce over the resulting array of booleans to see if there was a sentence that matched all of the items
  • test the result using $match and $expr
db.texts.aggregate([
  {$lookup: {
      from: "terms",
      let: {sentences: "$sentence"},
      pipeline: [
        {$match: {
           $expr: {
             $reduce: {
               initialValue: false,
               input: {
                 $map: {
                   input: "$$sentences",
                   as: "sentence",
                   in: {$reduce: {
                        input: "$items",
                        initialValue: "true",
                        in: {$and: [
                            "$$value",
                            {$regexMatch: {
                                regex: "$$this",
                                input: "$$sentence"
                            }}
                        ]}
                   }}
                 }
               },
               in: {$or: ["$$this", "$$value"]}
             }
           }
       }}
     ],
     as: "term_obj"
  }}
])

Playground