ElasticSearch: Metric aggregation and doc values / field-data

Question

How does ES internally implement metric aggregations ?

Suppose documents in the index have below structure:

{
  category: A,
  measure: 20
}

Would for the below query which does terms aggregation on category and calculate sum(measure), the 'measure' field values

be extracted from the document (i.e. _source) and summed or
would the values be taken from doc-values / field data of 'measure' field

Query:

{
  size: 0,
  aggs: {
     cat_aggs: {
       terms: {
          field: 'category'
       },
       aggs: {
          sumAgg: {
             sum: {field: 'measure'}
          }
       }
    }

  }
}

Val Val · Accepted Answer · 2016-01-11T05:57:57

From the official documentation on metrics aggregations (emphasis added):

The aggregations in this family compute metrics based on values extracted in one way or another from the documents that are being aggregated. The values are typically extracted from the fields of the document (using the field data), but can also be generated using scripts.

If you're using a newer ES 2.x version, then doc_values have become the norm over field data.

All fields which support doc values have them enabled by default. If you are sure that you don’t need to sort or aggregate on a field, or access the field value from a script, you can disable doc values in order to save disk space

So to answer your question clearly, metrics aggregations are computed based on either field data or doc values that have been stored at indexing time, i.e. not computed based on source parsing at query time, unless your doing it from a script which accesses the _source directly.

ElasticSearch: Metric aggregation and doc values / field-data

1 Answers