0
votes

Let's say I have the following documents

{colors: [red, blue, green]}
{colors: [yellow, brown, green]}
{colors: [purple]}
{colors: [red, blue, orange]}

I want to make a function that queries the collection for documents that have at least one of the colors in the input, which is an array. So for example, if I want to find a document with red, blue, and/or green, I would write

findColors([red, blue, green])

and it would return

{colors: [red, blue, green]}
{colors: [red, blue, orange]}
{colors: [yellow, brown, green]}

IN THE ABOVE ORDER, as the document with the most perfect match is first, and the least perfect match is last. My problem is that the number of elements in the array are arbitrary, so I don't know how to combine them into one query without a loop. The problem with just passing the array in directly is that it will only return

{colors: [red, blue, green]}

but I want as many matches as possible. Plus, order within the array shouldn't have to matter.

Is there a way to either 1) combine multiple independent queries into one big query 2) use some sort of pattern matching function on the array such that it creates every possible combination of the elements of the array, which I can then pass into the function.

Also, in order to keep the run time low, I'm thinking of just writing "Stop running after you find 10 documents"

1

1 Answers

1
votes

You don't need to loop queries or any such thing. Submitting multiple items to test for with a database is called an $or condition, or in this case it can be explicitly shortened to $in given the input list to match a similar list of values. This will return documents that contain any of the listed values against the compared field, which in this case is also an array.

The only other thing required is to match the common elements with a count of those in common. Here the $setIntersection operator of the aggregation framework can be used to match the elements along with $size to return the size of the matched list.

That effectively "scores" the document by matched elements which can be passed to $sort to return the most matches first. Then the only other thing asked is "stop at 10", which is what a $limit operation does for results.

As a complete example:

var async = require('async'),
    mongoose = require('mongoose'),
    Schema = mongoose.Schema;

mongoose.connect('mongodb://localhost/test');

var data = [
  { "colors": [ "red", "blue", "green" ] },
  { "colors": [ "yellow", "brown", "green" ] },
  { "colors": [ "purple" ] },
  { "colors": [ "red", "blue", "orange" ] }
];


var colorSchema = new Schema({
  "colors": []
});

var Color = mongoose.model('Color',colorSchema);

function findColors(list,callback) {
  Color.aggregate(
    [
      { "$match": { "colors": { "$in": list } } },
      { "$project": {
        "colors": 1,
        "score": {
          "$size": {
            "$setIntersection": [ "$colors", list ]
          }
        }
      }},
      { "$sort": { "score": -1 } },
      { "$limit": 10 }
    ],
    callback
  );
}


async.series(
  [
    function(callback) {
      Color.remove({},callback);
    },
    function(callback) {
      async.each(data,function(item,callback) {
        console.log(item);
        Color.create(item,callback);
      },callback);
    },
    function(callback) {
      console.log("here");
      findColors(["red","blue","green"],function(err,result) {
        console.log(result);
        callback(err);
      });
    }
  ],
  function(err) {
    if (err) throw err;
    mongoose.disconnect();
  }
)

Which produces the output:

[ { _id: 55ff60dc683554e114f90bf2,
    colors: [ 'red', 'blue', 'green' ],
    score: 3 },
  { _id: 55ff60dc683554e114f90bf5,
    colors: [ 'red', 'blue', 'orange' ],
    score: 2 },
  { _id: 55ff60dc683554e114f90bf3,
    colors: [ 'yellow', 'brown', 'green' ],
    score: 1 } ]

So the document with the most matches is on top and those with less are further down. Also anything that did not match the list at all is excluded.