1
votes

Basically, I'm trying to find the duplicate contacts by first name, last name & email address. For that, I've tried to use composite aggregation with the fields firstName, lastName & emails.email, the response from the query has the values bucketed for non-nested fields(such as firstName & lastName), but the nested field emails.email doesn't have value at all -> it returns NULL: https://www.screencast.com/t/98CKr0I5

Am I missing something here? any help would be greatly appreciated.

Below is one of the example document

{
    "regionId": 10,
    "firstName": "John",
    "lastName": "mayer",
    "emails": [
      {
        "isPrimary": true,
        "email": "[email protected]"
      }
    ]
}

And, I'm trying to query the Elasticsearch as follows:

GET contacts/_search
{
  "size" : 0,
  "query" : {
    "term" : {
      "regionId" : {
        "value" : 10,
        "boost" : 1.0
      }
    }
  },
  "_source" : false,
  "stored_fields" : "_none_",
  "aggregations" : {
    "groupby" : {
      "composite" : {
        "size" : 1000,
        "sources" : [
          {
            "firstNameField" : {
              "terms" : {
                "field" : "firstName.keyword",
                "missing_bucket" : true,
                "order" : "asc"
              }
            }
          },
          {
            "lastNameField" : {
              "terms" : {
                "field" : "lastName.keyword",
                "missing_bucket" : true,
                "order" : "asc"
              }
            }
          },
          {
            "emailField" : {
              "terms" : {
                "field" : "emails.email.keyword",
                "missing_bucket" : true,
                "order" : "asc"
              }
            }
          }
        ]
      },
      "aggregations" : {
        "having.3483" : {
          "bucket_selector" : {
            "buckets_path" : {
              "a0" : "_count"
            },
            "script" : {
              "source" : "InternalSqlScriptUtils.nullSafeFilter(InternalSqlScriptUtils.gt(params.a0,params.v0))",
              "lang" : "painless",
              "params" : {
                "v0" : 1
              }
            },
            "gap_policy" : "skip"
          }
        }
      }
    }
  }
}
1

1 Answers

0
votes

That's unfortunately not possible. All sources in the composite would need to be under the same nested context.

I'd recommend extracting the primary email & setting it in the top level context:

GET contacts/_update_by_query
{
  "query": {
    "nested": {
      "path": "emails",
      "query": {
        "exists": {
          "field": "emails.isPrimary"
        }
      }
    }
  },
  "script": {
    "source": """
      ctx._source.primary_email = ctx._source.emails.find(egroup -> egroup.isPrimary).email;
    """,
    "lang": "painless"
  }
}

Then perform the composite agg on primary_email.keyword.