I am attempting to use Map/Reduce to accomplish partial merges into an existing collection. I have the MR working correctly but am having troubles returning the merged results.
Here are the stats on the MR with output type of reduced:
{
"result" : "calculation",
"timeMillis" : 222,
"counts" : {
"input" : 492,
"emit" : 920,
"reduce" : 64,
"output" : 435078
},
"ok" : 1.0
}
I would expect output to be the number of docs actually merged, not the entire collection. Is there any way to do this?
I tried to merge a modified:true flag into the target docs. This way a query could be made that returns only the documents that were modified in the target collection. After the query, I then set flag back to false.
While this works correctly, it starts thrashing the index because of the massive amount of changes being made then flipped back, so the HD rate shoots up and MR performance plummets.
Ideally, calling result.GetResults() from the C# driver would naturally return the documents that were modified by the MR without the need to use flags.
Update:
Specifically, I have one collection that is "write only" which the MR runs on to merge into a "read" collection.
If there was a document set like
{
"_id":BsonId,
"key":"key1",
"valarray":["one"],
},
{
"_id":BsonId
"key":"key2"
"valarray":["one"]
}
then MR into the blank query collection would yield
{
"_id":"key1",
"value":
{
"valarray":["one"]
}
},
{
"_id":"key2",
"value":
{
"valarray":["one"]
}
}
and I would expect that the counts would be: input = 2, emit = 2, reduce = 0, output = 2
If then there was a new document inserted into the write collection
{
"_id":BsonId,
"key":"key1",
"valarray":["two"],
}
then the map-reduce collection would be
{
"_id":"key1",
"value":
{
"valarray":["one", "two"]
}
},
{
"_id":"key2",
"value":
{
"valarray":["one"]
}
}
The counts are then: input = 1, emit = 1, reduce = 1, output = 2
And through the C# driver, calling result.GetResults() would iterate over the whole target collection. The issue is that I do not want to iterate over the collection, I only want to iterate over the documents in the target collection that were modified by the MR. In this case, it should return "_id":"key1" but not "_id":"key2".