Elastic search - Getting an overall status out of multiple records and then counting by status

Question

I am using elasticsearch to index my data, which looks (sort of) like this:

[
 ...
 {"Name": "john",  "Part": "head",       "broken": true},
 {"Name": "john",  "Part": "shoulders",  "broken": false},
 {"Name": "john",  "Part": "knees",      "broken": false},
 {"Name": "john",  "Part": "toes",       "broken": false},
 {"Name": "steve", "Part": "head",       "broken": false},
 {"Name": "steve", "Part": "shoulders",  "broken": false},
 {"Name": "steve", "Part": "knees",      "broken": false},
 {"Name": "steve", "Part": "toes",       "broken": false}
 ...
]

Now I want to know the overall status of my people in form of a counter, showing how many are hurt and how many are fine. Here the name serves as a fingerprint and the person is fine, if non of his parts is broken:

People fine: 1
People hurt: 1

I tried using my (limited) knowledge of metrics and bucket aggregations but to no avail. Now I want to know:

Is it even possible using an elastic-search query to get this counter? And if yes, how can I construct such a query?

Joe - Elasticsearch Handbook Joe - Elasticsearch Handbook · Accepted Answer · 2021-01-29T16:56:11

This can be solved through

a scripted_metric aggregation which tends to be rather slow
or through a relatively complicated combination of bucket_selector and bucket_script aggregations.

Let's go with the latter. Assuming you have a .keyword mapping on the Name field, you could do:

POST index_name/_search
{
  "size": 0,
  "aggs": {
    "multibucket": {
      "filters": {
        "filters": {
          "placeholder": {
            "match_all": {}
          }
        }
      },
      "aggs": {
        "all_ppl_count": {
          "cardinality": {
            "field": "Name.keyword"
          }
        },
        "by_name": {
          "terms": {
            "field": "Name.keyword",
            "size": 100
          },
          "aggs": {
            "is_broken": {
              "terms": {
                "field": "broken",
                "size": 2
              }
            },
            "filter_fine_ppl": {
              "bucket_selector": {
                "buckets_path": {
                  "has_broken_something": "is_broken._bucket_count"
                },
                "script": "params.has_broken_something > 1"
              }
            }
          }
        },
        "healthy_ppl_count": {
          "bucket_script": {
            "buckets_path": {
              "all_ppl": "all_ppl_count",
              "hurt_ppl": "by_name._bucket_count"
            },
            "script": "params.all_ppl - params.hurt_ppl"
          }
        }
      }
    }
  }
}

We've used a filters aggregation solely as a multi-bucket placeholder that enables us to use the bucket_* pipeline aggregations. After that it's just:

counting all unique people,
grouping the people based on their names,
inside is_broken checking whether they have anything broken, and if they do, then and only then retaining their bucket in the by_name aggregation so that the total count can be used later,
subtracting the broken people count obtained from the overall size of the by_name aggregation, thus giving us the resulting counts.

The response will then look something along the lines of:

"aggregations" : {
  "multibucket" : {
    "buckets" : {
      "placeholder" : {
        "doc_count" : 8,
        "all_ppl_count" : {         <--
          "value" : 1
        },
        "by_name" : {
          "buckets" : [
            { ... }
          ]
        },
        "healthly_ppl_count" : {    <--
          "value" : 1.0
        }
      }
    }
  }
}

from which we can easily obtain the "broken people count" by subtracting healthly_ppl_count from all_ppl_count.

Elastic search - Getting an overall status out of multiple records and then counting by status

1 Answers