1
votes

I'm using elasticsearch with java api and I'm trying to get average value of lowest record from each bucket of term aggregation. One solution I found is to get results like this

AggregationBuilders.terms("group_by_flights").field("flight_id)
    .subAggregation(AggregationBuilders.min("minimum").field("duration")))

and then count average on the code side. The problem is that if there will be lot of result, it will allocate a lot of memory to count it. I would like to do this on elastic side. I found, that there is something like avg bucket pipeline aggregation, which can be add as sibling aggregation to terms (and others)

"the average": {
  "avg_bucket": {
    "buckets_path": "some_bucket_path" 
  }
}

Problem is that in java api you can add pipeline aggregation only as subaggregation. So if we construct our aggregation like this our terms aggregation won't be seen

AggregationBuilders.terms("group_by_flights").field("flight_id")
    .subAggregation(PipelineAggregatorBuilders.avgBucket("avg", "group_by_flights.duration" *<- this wont't be seen because its subaggregation*))

I was thinking about making some empty top aggregation and then add all aggregations as subaggregations, but it seems like silly walk-around, and I'm not understanding something correctly. Any ideas?

2

2 Answers

1
votes

The only solution I found so far is to make aggregations as sub aggregation of "empty aggregation"

AggregationBuilders.global("global_aggregation")
    .subAggregation((AggregationBuilders.terms("group_by_flights").field("flight_id"))
        .subAggregation(AggregationBuilders.min("min").field("duration")))
    .subAggregation(PipelineAggregatorBuilders.avgBucket("avg_bucket_aggs","group_by_flights>min"))
1
votes

My solution is use FilterAggregationBuilder to do it, this one can filtering data.The first sub aggregation to make data bucket, the second sub aggregation to merge bucket data.

AggregationBuilders.filter("global_aggregation", bool)
    .subAggregation((AggregationBuilders.terms("group_by_flights").field("flight_id"))
    .subAggregation(AggregationBuilders.min("min").field("duration")))
    .subAggregation(PipelineAggregatorBuilders.avgBucket("avg_bucket_aggs", "group_by_flights>min"));