I'm using kettle v5.2 which support the aggregation pipleline in MongoDB when using MongoDB input the query works for small data set but I need to use option allowDiskUse to the query can't figure how to add this in pentaho while I tested this option in mongo shell and it's working as expected
http://docs.mongodb.org/manual/reference/method/db.collection.aggregate/
http://wiki.pentaho.com/display/EAI/MongoDB+Input#MongoDBInput-queryaggpipeline
this works
[ {$unwind: "$friends"}, {$group : { '_id' : '$friends.id', name: {'$first': '$friends.name'} ,count: {$sum:1} } } ,{$sort: {count: -1}}, {$limit: 100} ]
this doesn't
[ {$unwind: "$friends"}, {$group : { '_id' : '$friends.id', name: {'$first': '$friends.name'} ,count: {$sum:1} } } ,{$sort: {count: -1}}, {$limit: 100} ] , {allowDiskUse: true}