couchdb map/reduce view: counting only the most recent items

Question

I have the following documents. Time stamped positions of keywords.

{
  _id: willem-aap-1234,
  keyword:aap,
  position: 10,
  profile: { name: willem },
  created_at: 1234
},
{
  _id: willem-aap-2345,
  keyword:aap,
  profile: { name: willem },
  created_at: 2345
},
{
  _id: oliver-aap-1235,
  keyword:aap,
  profile: { name: oliver },
  created_at: 1235
},
{
  _id: oliver-aap-2346,
  keyword:aap,
  profile: { name: oliver },
  created_at: 2346
}

Finding the most recent keywords per profile.name can be done by:

map: function(doc) {
if(doc.profile)
    emit(
        [doc.profile.name, doc.keyword, doc.created_at], 
        { keyword : doc.keyword, position : doc.position, created_at: doc.created_at }
    );
}

reduce: function(keys, values, rered) {
  var r = values[0];
  for (var i=1; i<values.length; i++)
    if (r.created_at < values[i].created_at)
      r = values[i];
  return r;
}

And then query the database with

reduce : true,
group_level : 2,
startkey : [aname],
endkey : [aname,{}]

This gives me the most recent documents for the profile with name aname.

But now I want to count all most recent documents per keyword, and sum the positions. I cannot get my head around this trying to do it with map/reduce only.

My user case is:

find the most recent documents per profile.user, per keyword
count the number of unique profile.name's per keyword
sum the positions of the most recent document, per keyword

The only way I can make it work is using the following list function:

function(head, req) {
  var row;
  var counts = {};
  while (row = getRow()) {
    var v = row.value;
    var k = v.keyword;

    if (v.position) {
      if (!counts[k])
        counts[k] = { 
          position : 0,
          count : 0
        }
      counts[k].position += v.position;
      counts[k].count++;
    }
  }

  return JSON.stringify(counts);
}

Can anyone think of a better way to do this, using map/reduce only?

Thanks

Aurélien Aurélien · Accepted Answer · 2013-03-11T19:50:48

The meaning of some parts are still a bit cloudy (for example, what is a "position"?).

But from a pure formal point of view, it seems that your list creates an index on keyword while your map created an index on [profile, keyword, timestamp].

If you really need different indexes then you need several maps, one per index. The only exception is when you already have a map on [a,b,c], you can change the "group level" and get two other indexes: [a,b] and [a].

couchdb map/reduce view: counting only the most recent items

1 Answers