0
votes

I have 7 billion data in my collection. My data model is :

{ "_id" : ObjectId("515921e3bbea58d25eb07b22"), "serverId" : 0, "counterId" : 0, "inserted" : ISODate("2013-03-21T00:26:30Z"), "counterValue" : 0.03256159112788737, "min" : -29.967438408872113, "max" : 20.032561591127887 } .

I have 2 index serverId,counterId ; and my query is:

{$match:{ 'inserted':{$gte: new Date("2013-03-21T00:00:00Z") , $lt: new Date("2013-03-22T10:00:00Z") } } }, {$group : {_id:{serverId:'$serverId', counterId:'$counterId'} ,avgValue:{ $avg : '$counterValue'}}}

Finds average counterValues , grouped by serverId,counterId , inserted between spesific dates .

But it takes 13 seconds i need it must take 1 seconds.

How can I do that ?

3
What's your working set size vs RAM? You'll absolutely need to index the inserted column if you want to avoid a full collection scan.WiredPrairie

3 Answers

1
votes

You query on the inserted field, so the inserted field should have the index. The others are useless (for this query).

If you want to make a full index query which will help you to improve performance a lot, you should also put the fields in the index that you query on. In that case it can also use the index to get the values, instead of heaving to go to each individual document (on disk probably considering that amount of documents), which will slow down the query.

So you should include $project to the aggregation query. Index should be:

{inserted:1, serverId:1, counterId:1, counterValue:1}

And query:

{
    $match:{ 'inserted':{$gte: new Date("2013-03-21T00:00:00Z") , $lt: new Date("2013-03-22T10:00:00Z") } } }, 
    $project: {'serverId':1, 'counterId':1, 'counterValue':1, '_id':0},
    {$group : {_id:{serverId:'$serverId', counterId:'$counterId'} ,avgValue:{ $avg : '$counterValue'}}
}

also notice the _id:0. By default the _id field is returned, which is not in the index so it will go to each individual document, what you don't want.

0
votes

Why do not you add the index on Inserted column also ? it will have some performance gain . Indexes on the day inserted will help you to get the data fast and as specified in the document here : http://docs.mongodb.org/manual/applications/aggregation/ . index is going to be used buy $match put in the start. So have a index on inserted column

0
votes

Your index is useless for:

{$match:{ 'inserted':{$gte: new Date("2013-03-21T00:00:00Z") , $lt: new Date("2013-03-22T10:00:00Z") } } }, 
{$group : {_id:{serverId:'$serverId', counterId:'$counterId'} ,avgValue:{ $avg : '$counterValue'}}}

Since $group functions are in-memory and do not use an index, however, $match can. If you put an index as @Devesh said, on the inserted field you should get much better performance.

However, that aside if this result set is huge, say 1 million records of those 7 billion records you could still see terrible performance from the framework. This is because, at the end of the day, there is no easy way to do this; it is slow because of how much you are aggregating.

A better alternative might be to use pre-aggregated reports instead which will provide your data in the form you need via some other mechanism (like on save in the client) to create performant querying.