0
votes

I have a MapReduce query that runs over a collection - mycollection - that currently holds 4 documents, each one with this same structure:

{           
    myobject: {
        key_field: "some_name",
        one_number: 15,
        other_numer: 20
    },
    some_more_data: {}
}

key_field is not unique. In this example, I have 4 documents with key_field: "some_name" and around 400 in total.

The reduce() fumction performs some arithmetic operations on one_number and other_number, and is supposed to output the results to a new collection (my_mapreduce_collection):

var map = function() {
    emit(this.myobject.key_field, {
        field1: this.myobject.one_number, 
        field2: this.myobject.other_number
    });
};

var reduce = function(key, values) {
    var sum = 0;
    values.forEach(function(doc, idx) {

        //Output each iteration:
        print("Key: "+key+", Idx: "+idx+" --> "+JSON.stringify(doc));

        sum += (doc.field1 - doc.field2);
    });
    return sum;
};

var MR = {
  mapreduce: "my_mongodb_collection", 
  out:  "my_mapreduce_collection",
  map: map.toString(),
  reduce: reduce.toString()
};

However, I sometimes get nan values on certain key_field.

So, I added that print() function on reduce(), and this is what it outputs:

...

Key: some_name, Idx: 0 --> {"one_number":15,"other_number":20}

Key: some_name, Idx: 1 --> {"one_number":10,"other_number":30}

Key: some_name, Idx: 0 --> 0

Key: some_name, Idx: 1 --> {"one_number":20,"other_number":40}

Key: some_name, Idx: 2 --> {"one_number":25,"other_number":50}

...

For some reason, I get a value "0" in between, instead of an object, and then the index restarts. This only happens on some documents. I have checked them, and they all look homogeneous.

Any idea on what might be happening?

Thank you!

1
"key_field" and "keyfield" are not the same thing. Typo? Or your basic mistake? - Neil Lunn
Sorry, it was a typo. I modified the original names to clarify the example. - user435943
Shouldn't that be this.myobject.["key_field"] ? - Neil Lunn
Anyhow, I would take a strong guess that your input query should be { "myobject.key_field": { "$exists": 1 } } as you are possibly running against things where that does not even exist. - Neil Lunn
You are using mapReduce incorrectly. The value you return from reduce should be the same structure as the value you emit from map. Though the way you are doing it may seem like it's working, as soon as you hit 100 records, you'll see how this breaks. - rob_james

1 Answers

1
votes

You are using mapReduce incorrectly. The value you return from reduce should be the same structure as the value you emit from map. Though the way you are doing it may seem like it's working, as soon as you hit 100 records, you'll see how this breaks.

The reason your index starts again is because reduce can be called multiple times (with the result of the first in the second) which is where my previous comment comes in. That's why you get a 0 in between, because the shape doesn't match.

You should use the finalize function to sum the previously reduced values.

var map = function() {
    emit(this.myobject.key_field, {
        field1: [this.myobject.one_number], 
        field2: [this.myobject.other_number]
    });
};

var reduce = function(key, values) {
    var res = {
        field1: [], 
        field2: []
    };
    values.forEach(function(doc, idx) {
        res.field1 = res.field1.concat(doc.field1);
        res.field2 = res.field2.concat(doc.field2);
    });
    return res;
};