1
votes

I'm curious about the best approach to count the instances of a particular field, across all documents, in a given ElasticSearch index.

For example, if I've got the following documents in index goober:

{
    '_id':'foo',
    'field1':'a value',
    'field2':'a value'
},
{
    '_id':'bar',
    'field1':'a value',
    'field2':'a value'
},
{
    '_id':'baz',
    'field1':'a value',
    'field3':'a value'
}

I'd like to know something like the following:

{
    'index':'goober',
    'field_counts':
        'field1':3,
        'field2':2,
        'field3':1
}

Is this doable with a single query? or multiple? For what it's worth, I'm using python elasticsearch and elasticsearch-dsl clients.

I've successfully issued a GET request to /goober and retrieved the mappings, and am learning how to submit requests for aggregations for each field, but I'm interested in learning how many times a particular field appears across all documents.

Coming from using Solr, still getting my bearings with ES. Thanks in advance for any suggestions.

1

1 Answers

1
votes

The below will return you the count of docs with "field2":

POST /INDEX/_search
{
  "size": 0, 
  "query": {
    "bool": {
      "filter": {
        "exists": {
          "field": "field2"
        }
      }
    }
  }
}

And here is an example using multiple aggregates (will return each agg in a bucket with a count), using field exist counts:

POST /INDEX/_search
{
  "size": 0,
  "aggs": {
    "field_has1": {
      "filter": {
        "exists": {
          "field": "field1"
        }
      }
    },
    "field_has2": {
      "filter": {
        "exists": {
          "field": "field2"
        }
      }
    }
  }
}

The behavior within each agg on the second example will mimic the behavior of the first query. In many cases, you can take a regular search query and nest those lookups within aggregate buckets.