ElasticSearch multi level parent-child aggregation

Question

I have a parent/child structure in 3 levels. Let's say:

Company -> Employee -> Availability

Since Availability (and also Employee) is frequently updated here, I choose using parent/child structure against nested. And search function works fine (all documents in correct shards).

Now I want to sort those results. Sorting them by meta data from company (1st level) is easy. But I need to sort also by 3rd level (availability).

I want list of companies which are sorted by:

Distance from location given ASC
Rating DESC
Soonest availability ASC

For example:

Company A is 5 miles away, has rating 4 and soonest one of their employees is available in 20 hours Company B is also 5 miles away, also has rating 4 but soonest one of their employee is available in 5 hours.

Therefore sort result needs to be B, A.

I would like to append special weight to each of this data, so I started writing aggregations which I could later use in my custom_score script.

Full gist for creating index, importing data and searching

Now, I've managed to write a query which actually returns back result, but availability aggregation bucket is empty. However, I'm also getting results back too structured, I would like to flatten them.

Currently I get back:

Company IDS -> Employee IDS -> first availability

I would like to have aggregation like:

Company IDS -> first availability

This way I'm able to do my custom_score script to calculate score and sort them properly.

More simplified question:
How can one sort/aggregate by multi level (grand)children and possibly flatten the result.

Could you add your mapping and a few example docs (with descendants) to the gist? It's hard to see how to invent fake docs that allow adequate testing of your system. — Sloan Ahrens
Hey Sloan - I've added mapping and sample results. I've stripped it bit for easier understanding. Full stack has lot's more data in it :) Thanks! — Pete Minus
I had the same question here. Albeit probably less performant, I just request for all the results which has a default sort of DocCount. I then did my own recursive flattening, sorting & limiting, which wasn't ideal. — Matt Traynham
I've executed your gist, but when searching I get error 500 Query Failed [Failed to execute main query]]; nested: NullPointerException;. Can you execute your gist on your local environment and make sure it is ok? Thanks! — Val
Why don't create equation for your results. Your data not fuzzy ! You aggregate every query? . Aggregate is input actions, not an query or output. A question "how you check this result's is True(right) ?" — dsgdfg

Peter Dixon-Moses Peter Dixon-Moses · Accepted Answer · 2015-08-11T17:37:18

You don't need aggregations to do this:

These are the sort criteria:

Distance ASC (company.location)
Rating DESC (company.rating_value)
Soonest Future Availability ASC (company.employee.availability.start)

If you ignore #3, then you can run a relatively simple company query like this:

GET /companies/company/_search
{
 "query": { "match_all" : {} },
 "sort": {
    "_script": {
        "params": {
            "lat": 51.5186,
            "lon": -0.1347
        },
        "lang": "groovy",
        "type": "number",
        "order": "asc",
        "script": "doc['location'].distanceInMiles(lat,lon)"
    },
    "rating_value": { "order": "desc" }
  }
}

#3 is tricky because you need to reach down and find the availability ( company > employee > availability ) for each company closest to the time of the request and use that duration as a third sort criterion.

We're going to use a function_score query at the grandchild level to take the time difference between the request time and each availability in the hit _score. (Then we'll use the _score as the third sort criterion).

To reach the grandchildren we need to use a has_child query inside a has_child query.

For each company we want the soonest available Employee (and of course their closest Availability). Elasticsearch 2.0 will give us a "score_mode": "min" for cases like this, but for now, since we're limited to "score_mode": "max" we'll make the grandchild _score be the reciprocal of the time-difference.

          "function_score": {
            "filter": { 
              "range": { 
                "start": {
                  "gt": "2014-12-22T10:34:18+01:00"
                } 
              }
            },
            "functions": [
              {
                "script_score": {
                  "lang": "groovy",
                  "params": {
                      "requested": "2014-12-22T10:34:18+01:00",
                      "millisPerHour": 3600000
                   },
                  "script": "1 / ((doc['availability.start'].value - new DateTime(requested).getMillis()) / millisPerHour)"
                }
              }
            ]
          }

So now the _score for each grandchild (Availability) will be 1 / number-of-hours-until-available (so that we can use the maximum reciprocal time until available per Employee, and the maximum reciprocal(ly?) available Employee per Company).

Putting it all together, we continue to query company but use company > employee > availabilty to generate the _score to use as the #3 sort criterion:

GET /companies/company/_search
{
 "query": { 
    "has_child" : {
        "type" : "employee",
        "score_mode" : "max",
        "query": {
          "has_child" : {
            "type" : "availability",
            "score_mode" : "max",
            "query": {
              "function_score": {
                "filter": { 
                  "range": { 
                    "start": {
                      "gt": "2014-12-22T10:34:18+01:00"
                    } 
                  }
                },
                "functions": [
                  {
                    "script_score": {
                      "lang": "groovy",
                      "params": {
                          "requested": "2014-12-22T10:34:18+01:00",
                          "millisPerHour": 3600000
                       },
                      "script": "1/((doc['availability.start'].value - new DateTime(requested).getMillis()) / millisPerHour)"
                    }
                  }
                ]
              }
            }
          }
        }
    }
 },
 "sort": {
  "_script": {
    "params": {
        "lat": 51.5186,
        "lon": -0.1347
    },
    "lang": "groovy",
    "type": "number",
    "order": "asc",
    "script": "doc['location'].distanceInMiles(lat,lon)"
  },
  "rating_value": { "order": "desc" },
  "_score": { "order": "asc" }
 }
}

ElasticSearch multi level parent-child aggregation

2 Answers