0
votes

I have documents in Elasticsearch with the following structure:

   "mappings": {
  "document": {
    "properties": {
      "@timestamp": {
        "type": "date",
        "format": "strict_date_optional_time||epoch_millis"
      },
      "@version": {
        "type": "string"
      },
      "id_secuencia": {
        "type": "long"
      },
      "event": {
        "properties": {
          "elapsedTime": {
            "type": "double"
          },
          "requestTime": {
            "type": "date",
            "format": "strict_date_optional_time||epoch_millis"
          },
          "error": {
            "properties": {
              "errorCode": {
                "type": "string",
                "index": "not_analyzed"
              },
              "failureDetail": {
                "type": "string"
              },
              "fault": {
                "type": "string"
              }
            }
          },
          "file": {
            "type": "string",
            "index": "not_analyzed"
          },
          "messageId": {
            "type": "string"
          },
          "request": {
            "properties": {
              "body": {
                "type": "string"
              },
              "header": {
                "type": "string"
              }
            }
          },
          "responseTime": {
            "type": "date",
            "format": "strict_date_optional_time||epoch_millis"
          },
          "service": {
            "properties": {
              "operation": {
                "type": "string",
                "index": "not_analyzed"
              },
              "project": {
                "type": "string",
                "index": "not_analyzed"
              },
              "proxy": {
                "type": "string",
                "index": "not_analyzed"
              },
              "version": {
                "type": "string",
                "index": "not_analyzed"
              }
            }
          },
          "timestamp": {
            "type": "date",
            "format": "strict_date_optional_time||epoch_millis"
          },
          "user": {
            "type": "string",
            "index": "not_analyzed"
          }
        }
      },
      "type": {
        "type": "string"
      }
    }
  }
}

And I need to retrieve a list of unique values for the field "event.file" (to show in a Kibana Data Table) according to the following criteria:

  • There is more than one document with the same value for the field "event.file"

  • All the occurences for that value of "event.file" have resulted in error (field "event.error.errorCode" exists in all documents)

For that purpose the approach I've been testing is the use of terms aggregation, so I can get a list of buckets with all documents for a single file name. What I haven't been able to achieve is to drop some of the resulting buckets in the aggregation according to the previous criteria (if at least one of them does not have an error the bucket should be discarded).

Is this the correct approach or is there a better/easier way to get this type of result?

Thanks a lot.

1

1 Answers

0
votes

After trying out several queries I found the following approach (see query below) to be valid for my purpose. The problem I see now is that apparently it is not possible to do this in Kibana, as it has no support for pipeline aggregations (see https://github.com/elastic/kibana/issues/4584).

{
  "query": {
    "bool": {
      "must": [
        {
          "filtered": {
            "filter": {
              "exists": {
                "field": "event.file"
              }
            }
          }
        }
      ]
    }
  },
  "size": 0,
  "aggs": {
    "file-events": {
      "terms": {
        "field": "event.file",
        "size": 0,
        "min_doc_count": 2
      },
      "aggs": {
        "files": {
          "filter": {
            "exists": {
              "field": "event.file"
            }
          },
          "aggs": {
            "totalFiles": {
              "value_count": {
                "field": "event.file"
              }
            }
          }
        },
        "errors": {
          "filter": {
            "exists": {
              "field": "event.error.errorCode"
            }
          },
          "aggs": {
            "totalErrors": {
              "value_count": {
                "field": "event.error.errorCode"
              }
            }
          }
        },
        "exhausted": {
          "bucket_selector": {
            "buckets_path": {
              "total_files":"files>totalFiles",
              "total_errors":"errors>totalErrors"
            },
            "script": "total_errors == total_files"
          }
        }
      }
    }
  }
}

Again, if I'm missing something feedback will be appreciated :)