1
votes

I've a situation where I need to find how many documents match a certain query out of 10000 random documents.

Mongodb's $sample aggregation seems to be an efficient way to obtain random documents.

db.users.aggregate(
   [ { $sample: { size: 3 } } ]
)

But how can I run a query on the returned result?

I can obtain random ids via $sample and the do another query with $in for those ids, but I'm trying to learn if there is a simple way.

Update: More information

Other than "_id" and "email" fields other fields are user defined, like in customer.io you can add/remove other attributes.

  person

  {
  _id: "..."
  email : "[email protected]"
  facebook: "facebook page url"
  ... and lot of other fields which may be present or not depending on the person



  }

The query is also going to be generated by user, but for simplicity lets say: after selecting random 10000 documents I want to run

find({facebook: {$exists: true} }) 

on those selected documents.

1
can you add a sample document from your users collection and the query you are paliing to run after the $sample stage ? - felix
@felix added more information - EastSw
The answer posted by p.streef is actually the right one. First get 10000 random documents with {$sample: {$size: 10000}}, and then filter thoses documents with {$match: {facebook: {$exists: true}}} - felix

1 Answers

2
votes

you should add a $match statement

db.users.aggregate([ 
{ $sample: { size: 3 } },
{ $match: { facebook: {$exists : true} } },
{ $count: "nr_matches" }
])

read more on aggregation here: https://docs.mongodb.com/manual/aggregation/

edit: or even shorter

db.users.aggregate([ 
{ $sample: { size: 3 } },
{ $group: { _id : {facebook : {$exists : true}}, count : {$sum: 1}}}
])