4
votes

I am trying to understand using grok to filter my apache error logs.

My error log file looks like:

[Thu Feb 27 13:22:44 2014] [error] [client 10.110.64.71] script not found or unable to stat: /var/www/cgi-bin/php4

How can I use grok to filter that? I've got this far:

filter {
  grok {
    type => "apache-error"
    pattern => "\[%{HTTPDATE:timestamp}\] \[%{WORD:class}\] \[%{WORD:originator} %{IP:clientip}\] %{GREEDYDATA:errmsg}"
  }
}

I tried using the Grok Debugger but I barely have an idea what I'm doing. I am literally brand new to logstash.

1

1 Answers

5
votes

So the way to use the grok debugger app is this:

Insert your input in the "input" box and grok patterns to match this input in the "pattern" box. The way this works is, the regex engine, tries to find a match of the pattern you have specified in the input text. Any match is extracted and displayed in the output box (in JSON format, as key value pairs that you specified)

Grok patterns are something like consolidated and renamed regular expressions that you can re-use. In your case:

Input:  [Thu Feb 27 13:22:44 2014] [error] [client 10.110.64.71] script not found or unable to stat: /var/www/cgi-bin/php4

Your_Pattern: \[%{HTTPDATE:timestamp}\] \[%{WORD:class}\] \[%{WORD:originator} %{IP:clientip}\] %{GREEDYDATA:errmsg}

Now, this shows 'No Matches'. This is because, the HTTPDATE pattern is made up of this regex: %{MONTHDAY}/%{MONTH}/%{YEAR}:%{TIME} %{INT} as you can see here .

So it cannot match your input date format which is different. The match fails there itself. The regex engine will start parsing your input for the first possible match with the regex specified. But since it doesn't find the start pattern itself, it doesn't return any matches.

The correct pattern to specify would be this:

\[(?<timestamp>%{DAY:day} %{MONTH:month} %{MONTHDAY} %{TIME} %{YEAR})\] \[%{WORD:class}\] \[%{WORD:originator} %{IP:clientip}\] %{GREEDYDATA:errmsg}

Here, I have re-named the grok pattern %{DAY:day} %{MONTH:month} %{MONTHDAY} %{TIME} %{YEAR} to 'timestamp'. The syntax to do this is :

(?<new_name>regular expression / grok). 

This post provides a good explanation of using groks.