I am in the process of moving from mysql to mongodb. Started learning mongodb yesterday.
I have a big mysql table (over 4 million rows, with over 300 fields each) which I am moving to mongodb.
Let's assume, the products table have the following fields -
_id, category, and 300+ other fields.
To find the top 5 categories in the products along with their count, I have the following mysql query
Select category, count(_id) as N from products group by category order by N DESC limit 5;
I have an index on category field and this query takes around 4.4 sec in mysql.
Now, I have successfully moved this table to mongodb and this is my corresponding query for finding top 5 categories with their counts.
db.products.aggregate([{$group : {_id:"$category", N:{$sum:1}}},{$sort:{N: -1}},{$limit:5}]);
I again have an index on category but the query doesn't seem to be using it (explain : true says so) and it is also taking around 13.5 sec for this query.
Having read more about mongodb aggregation pipeline optimization, I found out that we need to use sort prior to aggregation for index to work but I am sorting on the derived field from aggregation so can't bring it before the aggregate function.
How do I optimize queries like these in mongodb?
========================================================================= Output of explain
db.products.aggregate([{$group : {_id:"$category",N:{$sum:1}}},{$sort:{N: -1}},{$limit:5}], { explain: true });
{
"waitedMS" : NumberLong(0),
"stages" : [
{
"$cursor" : {
"query" : {
},
"fields" : {
"category" : 1,
"_id" : 0
},
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "mydb.products",
"indexFilterSet" : false,
"parsedQuery" : {
"$and" : [ ]
},
"winningPlan" : {
"stage" : "COLLSCAN",
"filter" : {
"$and" : [ ]
},
"direction" : "forward"
},
"rejectedPlans" : [ ]
}
}
},
{
"$group" : {
"_id" : "$category",
"N" : {
"$sum" : {
"$const" : 1
}
}
}
},
{
"$sort" : {
"sortKey" : {
"N" : -1
},
"limit" : NumberLong(5)
}
}
],
"ok" : 1
}
explainquery please - Alex