1
votes

I have the following documents. Time stamped positions of keywords.

{
  _id: willem-aap-1234,
  keyword:aap,
  position: 10,
  profile: { name: willem },
  created_at: 1234
},
{
  _id: willem-aap-2345,
  keyword:aap,
  profile: { name: willem },
  created_at: 2345
},
{
  _id: oliver-aap-1235,
  keyword:aap,
  profile: { name: oliver },
  created_at: 1235
},
{
  _id: oliver-aap-2346,
  keyword:aap,
  profile: { name: oliver },
  created_at: 2346
}

Finding the most recent keywords per profile.name can be done by:

map: function(doc) {
if(doc.profile)
    emit(
        [doc.profile.name, doc.keyword, doc.created_at], 
        { keyword : doc.keyword, position : doc.position, created_at: doc.created_at }
    );
}

reduce: function(keys, values, rered) {
  var r = values[0];
  for (var i=1; i<values.length; i++)
    if (r.created_at < values[i].created_at)
      r = values[i];
  return r;
}

And then query the database with

reduce : true,
group_level : 2,
startkey : [aname],
endkey : [aname,{}]

This gives me the most recent documents for the profile with name aname.

But now I want to count all most recent documents per keyword, and sum the positions. I cannot get my head around this trying to do it with map/reduce only.

My user case is:

  1. find the most recent documents per profile.user, per keyword
  2. count the number of unique profile.name's per keyword
  3. sum the positions of the most recent document, per keyword

The only way I can make it work is using the following list function:

function(head, req) {
  var row;
  var counts = {};
  while (row = getRow()) {
    var v = row.value;
    var k = v.keyword;

    if (v.position) {
      if (!counts[k])
        counts[k] = { 
          position : 0,
          count : 0
        }
      counts[k].position += v.position;
      counts[k].count++;
    }
  }

  return JSON.stringify(counts);
}

Can anyone think of a better way to do this, using map/reduce only?

Thanks

1

1 Answers

0
votes

The meaning of some parts are still a bit cloudy (for example, what is a "position"?).

But from a pure formal point of view, it seems that your list creates an index on keyword while your map created an index on [profile, keyword, timestamp].

If you really need different indexes then you need several maps, one per index. The only exception is when you already have a map on [a,b,c], you can change the "group level" and get two other indexes: [a,b] and [a].