Duplicate field values for grok-parsed data

Question

I have a filebeat that captures logs from uwsgi application running in docker. The data is sent to the logstash which parses it and forwards to elasticsearch.

Here is the logstash conf file:

input {
  beats {
    port => 5044
  }
}

filter {
  grok {
    match => { "log" => "\[pid: %{NUMBER:worker.pid}\] %{IP:request.ip} \{%{NUMBER:request.vars} vars in %{NUMBER:request.size} bytes} \[%{HTTPDATE:timestamp}] %{URIPROTO:request.method} %{URIPATH:request.endpoint}%{URIPARAM:request.params}? => generated %{NUMBER:response.size} bytes in %{NUMBER:response.time} msecs(?: via sendfile\(\))? \(HTTP/%{NUMBER:request.http_version} %{NUMBER:response.code}\) %{NUMBER:headers} headers in %{NUMBER:response.size} bytes \(%{NUMBER:worker.switches} switches on core %{NUMBER:worker.core}\)" }
  }
  date {
    # 29/Oct/2018:06:50:38 +0700
    match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z"]
  }

  kv {
    source => "request.params"
    field_split => "&?"
    target => "request.query"
  }
}

output {
  elasticsearch {
    hosts => ["http://localhost:9200"]
    index => "test-index"
  }
}

Everything was fine, but I've noticed that all values captured by the grok pattern is duplicated. Here is how it looks in kibana:

Note that the raw data like log which wasn't grok output is fine. I've seen that kv filter has allow_duplicate_values parameter, but it doesn't apply to grok.

What is wrong with my configuration? Also, is it possible to rerun grok patterns on existing data in elasticsearch?

Fares Fares · Accepted Answer · 2018-11-03T01:31:26

Maybe your filebeat is already doing the job and creating these fields

Did you try to add this parameter to your grok ?

overwrite => [ "request.ip", "request.endpoint", ... ]

In order to rerun grok on already indexed data you need to use elasticsearch input plugin in order to read data from ES and re-index it after grok.

Duplicate field values for grok-parsed data

1 Answers