Elasticsearch: filter aggregation using bucket value

Question

Not sure how to formulate the question. I'm using Elasticsearch 2.2.

Let's start with an example of the dataset, made of 5 documents:

[
  {
    "header": {
      "called_entity": { "uuid": "a" },
      "coverage_entity": {},
      "sucessful_transfers": 1
    }
  },
  {
    "header": {
      "called_entity": { "uuid": "a" },
      "coverage_entity": { "uuid": "b" },
      "sucessful_transfers": 1
  }
  },
  {
    "header": {
      "called_entity": { "uuid": "b" },
      "coverage_entity": { "uuid": "a" },
      "sucessful_transfers": 1
    }
  },
  {
    "header": {
      "called_entity": { "uuid": "b" },
      "coverage_entity": { "uuid": "a" },
      "sucessful_transfers": 0
    }
  }
]

called_entity always has a uuid. coverage_entity can be empty, or have an uuid.

I use a script to aggregate on either called_entity.uuid or coverage_entity.uuid:

{
  "size": 0,
  "query": {
    "match_all": {}
  },
  "aggs": {
    "dim1": {
      "terms": {
        "script" : "return doc['header.called_entity.uuid'] + doc['header.coverage_entity.uuid']",
        "size": 10
      },
      "aggs": {
        "successful_transfers": {
          "sum": {
            "field": "header.successful_transfers"
          }
        }
      }
    }
  }
}

So now, the aggregation has generated terms from either header.called_entity.uuid, or header.coverage_entity.uuid.

How can I filter my aggregation using the value of the aggregation key? For example, if I want to count, for each bucket, how many documents have their uuid taken from header.called_entity.uuid only. Something like that:

{
  "size": 0,
  "query": {
    "match_all": {}
  },
  "aggs": {
    "dim1": {
      "terms": {
        "script" : "return doc['header.called_entity.uuid'] + doc['header.coverage_entity.uuid']",
        "size": 10
      },
      "aggs": {
        "successful_transfers": {
          "sum": {
            "field": "header.successful_transfers"
          }
        },
        "from_called_entity": {
          "filter": {
            "term": { "header.called_entity.uuid": BUCKET_KEY }
          }
        }
      }
    }
  }
}

Andrei Stefan Andrei Stefan · Accepted Answer · 2016-07-29T14:12:09

Not sure this is possible. The key itself is only available as a sorting option.

Could you use something like this:

{
  "size": 0,
  "query": {
    "match_all": {}
  },
  "aggs": {
    "dim1": {
      "terms": {
        "script": "return doc['header.called_entity.uuid'] + doc['header.coverage_entity.uuid']",
        "size": 10
      },
      "aggs": {
        "successful_transfers": {
          "sum": {
            "field": "header.sucessful_transfers"
          }
        }
      }
    },
    "called_entity_source": {
      "terms": {
        "field": "header.called_entity.uuid",
        "size": 10
      }
    },
    "coverage_entity_source": {
      "terms": {
        "field": "header.coverage_entity.uuid",
        "size": 10
      }
    }
  }
}

And the output will be something like this:

  "called_entity_source": {
     "doc_count_error_upper_bound": 0,
     "sum_other_doc_count": 0,
     "buckets": [
        {
           "key": "a",
           "doc_count": 2
        },
        {
           "key": "b",
           "doc_count": 2
        }
     ]
  },
  "coverage_entity_source": {
     "doc_count_error_upper_bound": 0,
     "sum_other_doc_count": 0,
     "buckets": [
        {
           "key": "a",
           "doc_count": 2
        },
        {
           "key": "b",
           "doc_count": 1
        }
     ]
  },
  "dim1": {
     "doc_count_error_upper_bound": 0,
     "sum_other_doc_count": 0,
     "buckets": [
        {
           "key": "a",
           "doc_count": 4,
           "successful_transfers": {
              "value": 3
           }
        },
        {
           "key": "b",
           "doc_count": 3,
           "successful_transfers": {
              "value": 2
           }
        }
     ]
  }

If you really need to have the json in that specific way, add another final step in your application where you post process the result a bit. The result above does contain the info you need but the keys from coverage_entity_source and called_entity_source are not under the dim aggregation.

Elasticsearch: filter aggregation using bucket value

1 Answers