
Shortly: with Elasticsearch, given a list of fields, how can I get the average number of missing fields per document as an aggregation?


With the missing aggregation type I can get the total number of documents where a given field is missing. So with the following data:

"hits": [{
    "name": "A name",
    "nickname": "A nickname",
    "bestfriend": "A friend",
    "hobby": "An hobby"
    "name": "A name",
    "hobby": "An hobby"
    "name": "A name",
    "nickname": "A nickname",
    "hobby": "An hobby"
    "name": "A name",
    "bestfriend": "A friend"

I can run the following query:

    "aggs": {
        "name_missing": {
            "missing": {"field": "name"}
        "nickname_missing": {
            "missing": {"field": "nickname"}
        "hobby_missing": {
            "missing": {"field": "hobby"}
        "bestfriend_missing": {
            "missing": {"field": "bestfriend"}

And I get the following aggregations:

"aggregations": {
    "name_missing": {
        "doc_count": 0
    "nickname_missing": {
        "doc_count": 2
    "hobby_missing": {
        "doc_count": 1
    "bestfriend_missing": {
        "doc_count": 1

What I need now is to get the average number of missing fields for each document. I can just do the math by code on the results:

  • sum all the missing aggregations doc_count value
  • divide by the total number of hits

But how would you get the same result as an aggregation from Elasticsearch?

Thank you for any reply / suggestion.

Share your ES query.Hatim Stovewala
@HatimStovewala question has been updated. Thank you!Francesco Abeni

1 Answers


This is an ugly solution but it does the trick.

GET missing/missing/_search
  "size": 0,
  "aggs": {
    "result": {
      "terms": {
        "script": "'aaa'"
      "aggs": {
        "name_missing": {
          "missing": {
            "field": "name"
        "nickname_missing": {
          "missing": {
            "field": "nickname"
        "hobby_missing": {
          "missing": {
            "field": "hobby"
        "bestfriend_missing": {
          "missing": {
            "field": "bestfriend"
        "avg_missing": {
          "bucket_script": {
            "buckets_path": {            // This is kind of defining variables. name_missing._count will take the doc_count of the name_missing aggregation and same for others(nickname_missing,hobby_missing,bestfriend_missing) as well. "count":"_count" will take doc_count of the documents on which aggregation is performed(total no. of Hits).
              "name_missing": "name_missing._count",
              "nickname_missing": "nickname_missing._count",
              "hobby_missing": "hobby_missing._count",
              "bestfriend_missing": "bestfriend_missing._count",
            "script": "(name_missing+nickname_missing+hobby_missing+bestfriend_missing)/count" // Here we are adding all the missing values and dividing it by the total no. of Hits as you require.

I've shown you how to do it, now its on you how you want to massage your parameters and extract what you intend to.