0
votes

I have the following documents in the elastic search index.

[{
        "_index": "ten2",
        "_type": "documents",
        "_id": "c323c2244a4a4c22_en-us",
        "_source": {
            "publish_details": [{
                    "environment": "603fe91adbdcff66",
                    "time": "2020-06-24T13:36:55.514Z",
                    "locale": "hi-in",
                    "user": "aadab2f531206e9d",
                    "version": 1
                },
                {
                    "environment": "603fe91adbdcff66",
                    "time": "2020-06-24T13:36:55.514Z",
                    "locale": "en-us",
                    "user": "aadab2f531206e9d",
                    "version": 1
                }
            ],
            "created_at": "2020-06-24T13:36:43.037Z",
            "_in_progress": false,
            "title": "Entry 1",
            "locale": "en-us",
            "url": "/entry-1",
            "tags": [],
            "uid": "c323c2244a4a4c22",
            "updated_at": "2020-06-24T13:36:43.037Z",
            "fields": []
        }
    },
    {
        "_index": "ten2",
        "_type": "documents",
        "_id": "c323c2244a4a4c22_mr-in",
        "_source": {
            "publish_details": [{
                "environment": "603fe91adbdcff66",
                "time": "2020-06-24T13:37:26.205Z",
                "locale": "mr-in",
                "user": "aadab2f531206e9d",
                "version": 1
            }],
            "created_at": "2020-06-24T13:36:43.037Z",
            "_in_progress": false,
            "title": "Entry 1 marathi",
            "locale": "mr-in",
            "url": "/entry-1",
            "tags": [],
            "uid": "c323c2244a4a4c22",
            "updated_at": "2020-06-24T13:37:20.092Z",
            "fields": []
        }
    }
]

And I want Result [] blank from this. As here we can see that uid of both the documents is the same. I am using the following query to get result :

{
    "query": {
        "bool": {
            "must": [{
                "bool": {
                    "must_not": [{
                        "bool": {
                            "must": [{
                                "nested": {
                                    "path": "publish_details",
                                    "query": {
                                        "term": {
                                            "publish_details.environment": "603fe91adbdcff66"
                                        }
                                    }
                                }
                            }, {
                                "nested": {
                                    "path": "publish_details",
                                    "query": {
                                        "term": {
                                            "publish_details.locale": "en-us"
                                        }
                                    }
                                }
                            }, {
                                "nested": {
                                    "path": "publish_details",
                                    "query": {
                                        "term": {
                                            "publish_details.locale": "hi-in"
                                        }
                                    }
                                }
                            }, {
                                "nested": {
                                    "path": "publish_details",
                                    "query": {
                                        "term": {
                                            "publish_details.locale": "mr-in"
                                        }
                                    }
                                }
                            }]
                        }
                    }]
                }
            }]
        }
    }
}

But the above query gives me all 2 documents, but I want results as bank the reason here is here uid is common and that uid contains all three local in publishing details. So is way to get a valid result, Is any aggregation query that helps me here. it is just a sample I have so many documents to filter out. Kindle Helps me here.

1
help me into the above query to get valid results. - Suraj Dalvi
Your question is unclear - Gibbs
@Gibbs I have shared my elastic search documents list and query. I want an empty result but my query giving all documents. so I want such a query that gives me a blank result. the query related to publish_details.locale and publish_details.environment. - Suraj Dalvi
You are saying that same uid and check for all 3 locale!? - Gibbs
Yes, value of uid field is the same for both documents. - Suraj Dalvi

1 Answers

1
votes
{
  "aggs": {
    "agg1": {
      "terms": {
        "field": "uid.raw"
      },
      "aggs": {
        "agg2": {
          "nested": {
            "path": "publish_details"
          },
          "aggs": {
            "locales": {
              "terms": {
                "field": "publish_details.locale"
              }
            }
          }
        }
      }
    }
  }
}

This query will group you by uid first then publish_details.locale

It provides results as below

"aggregations": {
        "agg1": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
                {
                    "key": "c323c2244a4a4c22",
                    "doc_count": 2,
                    "agg2": {
                        "doc_count": 3,
                        "locales": {
                            "doc_count_error_upper_bound": 0,
                            "sum_other_doc_count": 0,
                            "buckets": [
                                {
                                    "key": "en-us",
                                    "doc_count": 1
                                },
                                {
                                    "key": "hi-in",
                                    "doc_count": 1
                                },
                                {
                                    "key": "mr-in",
                                    "doc_count": 1
                                }
                            ]
                        }
                    }
                },
                {
                    "key": "c323c2244rrffa4a4c22",
                    "doc_count": 1,
                    "agg2": {
                        "doc_count": 2,
                        "locales": {
                            "doc_count_error_upper_bound": 0,
                            "sum_other_doc_count": 0,
                            "buckets": [
                                {
                                    "key": "en-us",
                                    "doc_count": 1
                                },
                                {
                                    "key": "hi-in",
                                    "doc_count": 1
                                }
                            ]
                        }
                    }
                }
            ]

I have three docs where two has same id and other one is different.

I will update the query further to remove the first result where you have 3 buckets. You also can proceed further to handle it in the code.

You can do that. 10k documents is fine. But when you have in millions, you should have enough resources to execute this.

{
  "size" : 0,
  "query":{
      "bool" :{
          "must_not":{
              "match":{
                "publish_details.environment":"603fe91adbdcff66"
              }
          }
      }
  },
  "aggs": {
    "uids": {
      "terms": {
        "field": "uid.raw"
      },
      "aggs": {
        "details": {
          "nested": {
            "path": "publish_details"
          },
          "aggs": {
            "locales": {
              "terms": {
                "field": "publish_details.locale"
              }
            },   
            "unique_locales": {
                "value_count": {
                    "field": "publish_details.locale"
                }
            }
          }
        }
      }
    }
  }
}

Result:

"aggregations": {
        "uids": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
                {
                    "key": "c323c2244a4a4c22",
                    "doc_count": 2,
                    "details": {
                        "doc_count": 3,
                        "locales": {
                            "doc_count_error_upper_bound": 0,
                            "sum_other_doc_count": 0,
                            "buckets": [
                                {
                                    "key": "en-us",
                                    "doc_count": 1
                                },
                                {
                                    "key": "hi-in",
                                    "doc_count": 1
                                },
                                {
                                    "key": "mr-in",
                                    "doc_count": 1
                                }
                            ]
                        },
                        "unique_locales": {
                            "value": 3
                        }
                    }
                },
                {
                    "key": "c323c2244rrffa4a4c22",
                    "doc_count": 1,
                    "details": {
                        "doc_count": 2,
                        "locales": {
                            "doc_count_error_upper_bound": 0,
                            "sum_other_doc_count": 0,
                            "buckets": [
                                {
                                    "key": "en-us",
                                    "doc_count": 1
                                },
                                {
                                    "key": "hi-in",
                                    "doc_count": 1
                                }
                            ]
                        },
                        "unique_locales": {
                            "value": 2
                        }
                    }
                }
            ]