0
votes

I am trying to setup the sebp/elk docker container to run the ELK stack on my machine. Goal is to use ELK to log/parse/search through log files like access/error logs for apache as well as logging php error logs that occur during php execution (which are multiline errors with stack traces).

An example of a php error log file I try to parse is:

[03-Jun-2020 00:39:11 Europe/Berlin] PHP Stack trace:
[03-Jun-2020 00:39:11 Europe/Berlin] PHP   1. {main}() /var/www/myserver.domain/html/index.php:0
[03-Jun-2020 00:39:11 Europe/Berlin] PHP   2. require() /var/www/myserver.domain/html/index.php:17
[03-Jun-2020 00:39:11 Europe/Berlin] PHP   3. require_once() /var/www/myserver.domain/html/wp-blog-header.php:16
[03-Jun-2020 00:39:11 Europe/Berlin] PHP   4. include() /var/www/myserver.domain/html/wp-includes/template-loader.php:27
[03-Jun-2020 00:39:11 Europe/Berlin] PHP   5. the_content() /var/www/myserver.domain/html/wp-content/themes/summer_freedom/index.php:20
[03-Jun-2020 00:39:11 Europe/Berlin] PHP   6. apply_filters() /var/www/myserver.domain/html/wp-includes/post-template.php:79
[03-Jun-2020 00:39:11 Europe/Berlin] PHP   7. call_user_func_array:{/var/www/myserver.domain/html/wp-includes/plugin.php:163}() /var/www/myserver.domain/html/wp-includes/plugin.php:163
[03-Jun-2020 00:39:11 Europe/Berlin] PHP   8. searchnggallerytags() /var/www/myserver.domain/html/wp-includes/plugin.php:163

I use filebeat to send the log from my local machine to my logstash container with the following filebeat.yml config:

  logstash:
    enabled: true
    hosts:
      - localhost:5044
    ssl:
      certificate_authorities:
        - /etc/filebeat/logstash-beats.crt
    timeout: 15

filebeat:
  prospectors:

    -
      paths:
        - /var/log/php/php_errors.log
      document_type: php-errors

the logstash configuration that I came up with for inside the elk container so far is the following:

input {
    stdin {
        codec => multiline {
            pattern => "^\[%{MONTHDAY}-%{MONTH}-%{YEAR} %{TIME} (?<tzname>[a-zA-Z]+/[a-zA-Z]+)\]"
            negate => true
            what => "previous"
            auto_flush_interval => 10
        }
        type => "php-errors"
    }
}

filter {
  if [type] == "php-errors" {
    grok {
        match => { "message" => "(?m)\[(?<logtime>%{MONTHDAY}-%{MONTH}-%{YEAR} %{TIME} (?<tzname>[a-zA-Z]+/[a-zA-Z]+))\] ?%{GREEDYDATA:message}" }
        overwrite => [ "message" ]
    }

    date {
        match => [ "logtime", "dd-MMM-yyyy HH:mm:ss" ]
        remove_field => [ "logtime" ]
    }
  }
}

output {
    stdout {
        codec => rubydebug
    }
}

In the beginning I was not sure if the pattern would really match, so I used the grok debugger inside kibana to double check it would be correct and really match against the input in the log file.

When using this configuration inside logstash in the sebp/elk container, I can see entries in kibana, so the general transfer via filebeat works and logstash is able to match the data as well. Unfortunately I get a message inside kibana for every line in the php errors log file, although I would like to have all the lines that belong to each other are concatenated and stored as one event inside elk.

As far as I understood the grok patterns here, logstash should use the same timestamp in every line and match multiline to write all the lines in one message instead of creating several events.

So the question is, if I just use the configuration wrong, or if there is anything missing so i will get only 1 event instead of multiple ones.

Update: as requested by @leandrojmp, i updated the logstash configuration as suggested but still got the following output for every line out of the php-error.log from logstash on stdout when running on cli:

{
          "host" => {
        "name" => "myserver.domain"
    },
      "@version" => "1",
    "@timestamp" => 2020-06-03T21:54:53.886Z,
       "message" => "[03-Jun-2020 23:54:49 Europe/Berlin] PHP   1. {main}() /var/www/myserver.domain/html/index.php:0",
          "beat" => {
         "version" => "6.4.3",
        "hostname" => "myserver.domain",
            "name" => "myserver.domain"
    },
          "tags" => [
        [0] "beats_input_codec_plain_applied"
    ],
        "offset" => 15896045,
        "source" => "/var/log/php/php_errors.log"
}

so it looks like the multiline matching is not working inside logstash for me.

update 2: after some more research i found out that it is not recommended to match multiline content inside logstash, since it might end up mixing different logs into one message if you send multiple logs from different machines to one logstash instance. The suggested way to go is use filebeat.yml to merge multiline messages before sending them to logstash.

1

1 Answers

1
votes

Your multiline pattern is not right, it makes every line that does not match it to be considered part of a multiline event (the negate option) and included in the previous event (the what option), but in your example every line starts with the same pattern, so you will never have a multiline event.

Your pattern needs to match something that is unique to the start of your multiline event, in your case it could be the "PHP Stack trace" string

Changing your multiline pattern to this one:

codec => multiline {
    pattern => "PHP Stack trace"
    negate => true
    what => "previous"
}

This would give you the following result:

{
      "@version" => "1",
          "tags" => [
        [0] "multiline"
    ],
    "@timestamp" => 2020-06-02T22:39:11.000Z,
        "tzname" => "Europe/Berlin",
          "type" => "php-errors",
       "message" => "PHP Stack trace:\n[03-Jun-2020 00:39:11 Europe/Berlin] PHP   1. {main}() /var/www/myserver.domain/html/index.php:0\n[03-Jun-2020 00:39:11 Europe/Berlin] PHP   2. require() /var/www/myserver.domain/html/index.php:17\n[03-Jun-2020 00:39:11 Europe/Berlin] PHP   3. require_once() /var/www/myserver.domain/html/wp-blog-header.php:16\n[03-Jun-2020 00:39:11 Europe/Berlin] PHP   4. include() /var/www/myserver.domain/html/wp-includes/template-loader.php:27\n[03-Jun-2020 00:39:11 Europe/Berlin] PHP   5. the_content() /var/www/myserver.domain/html/wp-content/themes/summer_freedom/index.php:20\n[03-Jun-2020 00:39:11 Europe/Berlin] PHP   6. apply_filters() /var/www/myserver.domain/html/wp-includes/post-template.php:79\n[03-Jun-2020 00:39:11 Europe/Berlin] PHP   7. call_user_func_array:{/var/www/myserver.domain/html/wp-includes/plugin.php:163}() /var/www/myserver.domain/html/wp-includes/plugin.php:163\n[03-Jun-2020 00:39:11 Europe/Berlin] PHP   8. searchnggallerytags() /var/www/myserver.domain/html/wp-includes/plugin.php:163",
          "host" => "logstash"
}

See that now all your lines are in the same event and in kibana you will have something like this in the message field:

PHP Stack trace:
[03-Jun-2020 00:39:11 Europe/Berlin] PHP   1. {main}() /var/www/myserver.domain/html/index.php:0\n[03-Jun-2020 00:39:11 Europe/Berlin] PHP   2. require() /var/www/myserver.domain/html/index.php:17
[03-Jun-2020 00:39:11 Europe/Berlin] PHP   3. require_once() /var/www/myserver.domain/html/wp-blog-header.php:16
[03-Jun-2020 00:39:11 Europe/Berlin] PHP   4. include() /var/www/myserver.domain/html/wp-includes/template-loader.php:27
[03-Jun-2020 00:39:11 Europe/Berlin] PHP   5. the_content() /var/www/myserver.domain/html/wp-content/themes/summer_freedom/index.php:20
[03-Jun-2020 00:39:11 Europe/Berlin] PHP   6. apply_filters() /var/www/myserver.domain/html/wp-includes/post-template.php:79
[03-Jun-2020 00:39:11 Europe/Berlin] PHP   7. call_user_func_array:{/var/www/myserver.domain/html/wp-includes/plugin.php:163}() /var/www/myserver.domain/html/wp-includes/plugin.php:163
[03-Jun-2020 00:39:11 Europe/Berlin] PHP   8. searchnggallerytags() /var/www/myserver.domain/html/wp-includes/plugin.php:163

Also, you need to fix your date filter, your timestamp has timezone information, you need to add it to the pattern.

The correct one would be:

date {
    match => [ "logtime", "dd-MMM-yyyy HH:mm:ss ZZZ" ]
    remove_field => [ "logtime" ]
}