0
votes

My mongo cursor looks like this :

{ 
  "_id":ObjectId("57558ee01807ce2f774569cc"),
  "description": "Lorem Ipnsun ....",
  "results":[
      {
         "name":"Alica James",
         "gender":"male"
      },
      {
         "name":"Alica James",
     "gender":"female"
      },
      {
         "name":"Alica James",
         "gender":"female"
      }
   ]
},
{ 
  "_id":ObjectId("57558ee01807ce2f774569c6"),
  "description": "Lorem Ipnsun ....",
  "results":[
      {
         "name":"Van Ban",
         "gender":"unclear"
      }
   ]
},
{ 
  "_id":ObjectId("57558ee01807ce2f774569c7"),
  "description": "Lorem Ipnsun ....",
  "results":[]
}

As you can see the results key can be empty or can have values. Inside it, there's a field name which for with exists a gender that can be male female or unclear. I want to find all documents in my collection, then search through each document check gender distribution for each name.

So for name "Alica James" i want my query to get

female_numbers_for_document = 2
male_numbers_for_document = 1
unclear_numbers_for_document = 0

For Van Ban:

female_numbers_for_document = 0
male_numbers_for_document = 0
unclear_numbers_for_document = 1

On python, I started to do it, first i found all the documents on collections then I started to iterate through each document in cursor and then I declared some vars to define gender but this doesn't work since it takes only first value and doesnt go throught results. Code look like this :

def find_gender_distribution(self):
    cursor = self.mongo.db[self.collection_name].find()
    for document in cursor:
        female_numbers_for_document = document.find({"results.gender": "female"}).count()
        male_numbers_for_document = document.find({"results.gender": "male"}).count()
        unclear_numbers_for_document = document.find({"results.gender": "unclear"}).count()

I don't know how to count how many documents inside results that contains same gender? Please help.

1

1 Answers

0
votes

You are using the wrong method to do this. You need to use the .aggregate() method which gives access to the aggregation pipeline.

unwind1 = {"$unwind": "$result"}
group1 = {
    "$group": {
        "_id": {"name": "$result.name", "gender": "$result.gender"},
        "count": {"$sum": 1}
    }
}
group2 = {
    "$group": {
        "_id": "$_id.name", 
        "nmale": {
            "$sum": {"$cond": [
                        {"$eq": ["$_id.gender", "male"]}, 
                        "$count", 
                        0
                    ]
            }
        }, 
        "nfemale": {
            "$sum": {"$cond": [
                        {"$eq": ["$_id.gender", "female"]}, 
                        "$count", 
                        0
                    ]
            }
        }, 
        "nunclear": {
            "$sum": {"$cond": [
                        {"$or": [
                            {"$ne": ["$_id.gender", "male"]}, 
                            {"$ne": ["$_id.gender", "female"]}
                        ]}, 
                        "$count", 
                        0
                    ]
            }
        }
    }
}       

pipeline = [unwind1, group1, group2]

def find_gender_distribution(self):
    collection = self.mongo.db[self.collection_name]
    cursor = collection.aggregate(pipeline)
    for document in cursor:
        print(document) #  or do something

If we print the cursor, it yields something like:

{ "_id" : "Alica James", "nmale" : 1, "nfemale" : 2, "nunclear" : 3 }
{ "_id" : "Van Ban", "nmale" : 0, "nfemale" : 0, "nunclear" : 1 }