0
votes

I have an instance of Filebeat (version 7.5.0, running on a Windows Server) monitoring a local folder for log files, and sending this data onwards to Logstash (version 7.5.0, running in a Docker continer). In Logstash I would like to extract one of the folder names (the last) and add this as a field.

A concrete example is that from two log entries, one from the file d:\\Logs\\Foo\\Bar\\lorem\\currentlog.txt and one from the file d:\\Logs\\Foo\\Bar\\ipsum\\currentlog.txt, I would like to extract the values lorem and ipsum respectively.

For this I have the following (simplified example) set up:

input {
    pipeline { address => "test" }
}

filter {
    grok {
        match => { "source" => ".*\\\\.*\\\\(?<product>.*)\\\\.*" }
    }
}

output {
    stdout { codec => rubydebug }
}

I have tested the regular expression used to find a match (named product) on the source field in several places (both grockconstructor, grockdebug and rubular), and they all seem to yield the desired result: I get a named match for product with the exected value of the last folder in the path.

However, when I run Logstash with the above pipeline configuration it does not manage to extract the folder name and put its value in the product field. Instead I see that a tag is added to the logstash output with the value grokparsefailure, indicating that there is something wrong with my grok expression. But all my testing in the above referenced tools indicates that there is nothing wrong with my expression...

The full logstash output looks like this:

{
    "@version" => "1",
    "tags" => [
        [0]"beats_input_codec_plain_applied",
        [1]"_grokparsefailure"
    ],
    "host" => {
        "name" => "test"
    },
    "message" => "Another line in the log",
    "agent" => {
        "id" => "e00d2f50-b10c-406a-a4fa-be381d15b869",
        "ephemeral_id" => "28dfe105-b936-40de-bc97-16c4a9196e30",
        "hostname" => "my-host",
        "name" => "test",
        "type" => "filebeat",
        "version" => "7.5.0"
    },
    "@timestamp" => 2019 - 12 - 16T14: 04: 09.064Z,
    "ecs" => {
        "version" => "1.1.0"
    },
    "log" => {
        "file" => {
            "path" => "d:\\Logs\\Foo\\Bar\\ipsum\\currentlog.txt"
        },
        "offset" => 21
    },
    "input" => {
        "type" => "log"
    }
}

I have tried changing the match to be on the log.file.path property, but that gives me the same _grokparsefailure tag.

I am also pretty sure that this worked on an earlier installation of Filebeat/Logstash (perhaps one or two major versions back), but I can't remember exactly.

So the question is: Why isn't Logstash able to extract the folder name from the Filebeat source? And is there a way I can debug this grok problem any further?

1

1 Answers

0
votes

The reason why the above configuration didn't work was composite, but I managed to figure it out eventually:

Firstly, there was no source field coming from Filebeat (I'm pretty sure there was some versions ago, but that's a different story), which obviously results in a non-sucessful grok filter.

Next, when I instead tried to grok on the log.file.path field, I was using the wrong syntax. The proper way to access a nested field is like so: [log][file][path]

And finally, even though the output showed the value of log.file.path to be "d:\\Logs\\Foo\\Bar\\ipsum\\currentlog.txt", the double backslashes was evidently added somewhere in the output pipeline. So when I changed my regex to match on single backslashes instead of double, it correcly extracted ipsum from "d:\Logs\Foo\Bar\ipsum\currentlog.txt"

My final pipeline configuration thus looks like this:

input {
    pipeline { address => "test" }
}

filter {
    grok {
        match => { "[log][file][path]" => ".*(\\|\/).*(\\|\/)(?<product>.*)(\\|\/).*"}
    }
}

output {
    stdout { codec => rubydebug }
}

And now I get successfully get the name of the last folder in the path extracted to the product field, without the _grokparsefailure tag.