Issue using grok filter with logstash and a windows file

Question

I am attempting to filter a sql server error log using Logstash and grok. Logstash 1.3.3 is running as a windows service using NSSM and JRE6. My config file is below

input {

    file {
        path => "c:\program files\microsoft sql server\mssql10_50.mssqlserver\mssql\log\errorlog"
        type => SQLServerLog
        start_position => "beginning"
        codec => plain {
            charset => "UTF-8"
            }
    }
}

filter {
        grok {
            type => "SQLServerLog"
            match => [ "message", "%{DATESTAMP:DateStamp} %{WORD:Process} %{GREEDYDATA:Message}" ]
            named_captures_only => true
            singles => true
            remove_tag => [ "_grokparsefailure" ]
            add_tag => [ "GrokFilterWorked" ]
        }
    }

output {

    stdout {
        codec => rubydebug
    }
    elasticsearch {
        embedded => true
    }

}

A sample of the log file content is below.

2014-01-31 00:00:38.73 spid21s     This instance of SQL Server has been using a process ID of 14632 since 28/01/2014 13:09:24 (local) 28/01/2014 13:09:24 (UTC). This is an informational message only; no user action is required.

Events are visible in Kibana but when collapsed the message is displayed like {"message":"\u00002\u00000\u00001\u00004...

When expanded the table view shows the event message as text instead. The raw data for the event when viewed is as below.

{
  "_index": "logstash-2014.01.31",
  "_type": "SQLServerLog",
  "_id": "NpvKSf4eTFSHkBdoG3zw6g",
  "_score": null,
  "_source": {
    "message": "\u00002\u00000\u00001\u00004\u0000-\u00000\u00001\u0000-\u00003\u00000\u0000 \u00000\u00000\u0000:\u00000\u00000\u0000:\u00002\u00001\u0000.\u00006\u00004\u0000 \u0000s\u0000p\u0000i\u0000d\u00002\u00004\u0000s\u0000 \u0000 \u0000 \u0000 \u0000 \u0000T\u0000h\u0000i\u0000s\u0000 \u0000i\u0000n\u0000s\u0000t\u0000a\u0000n\u0000c\u0000e\u0000 \u0000o\u0000f\u0000 \u0000S\u0000Q\u0000L\u0000 \u0000S\u0000e\u0000r\u0000v\u0000e\u0000r\u0000 \u0000h\u0000a\u0000s\u0000 \u0000b\u0000e\u0000e\u0000n\u0000 \u0000u\u0000s\u0000i\u0000n\u0000g\u0000 \u0000a\u0000 \u0000p\u0000r\u0000o\u0000c\u0000e\u0000s\u0000s\u0000 \u0000I\u0000D\u0000 \u0000o\u0000f\u0000 \u00001\u00004\u00006\u00003\u00002\u0000 \u0000s\u0000i\u0000n\u0000c\u0000e\u0000 \u00002\u00008\u0000/\u00000\u00001\u0000/\u00002\u00000\u00001\u00004\u0000 \u00001\u00003\u0000:\u00000\u00009\u0000:\u00002\u00004\u0000 \u0000(\u0000l\u0000o\u0000c\u0000a\u0000l\u0000)\u0000 \u00002\u00008\u0000/\u00000\u00001\u0000/\u00002\u00000\u00001\u00004\u0000 \u00001\u00003\u0000:\u00000\u00009\u0000:\u00002\u00004\u0000 \u0000(\u0000U\u0000T\u0000C\u0000)\u0000.\u0000 \u0000T\u0000h\u0000i\u0000s\u0000 \u0000i\u0000s\u0000 \u0000a\u0000n\u0000 \u0000i\u0000n\u0000f\u0000o\u0000r\u0000m\u0000a\u0000t\u0000i\u0000o\u0000n\u0000a\u0000l\u0000 \u0000m\u0000e\u0000s\u0000s\u0000a\u0000g\u0000e\u0000 \u0000o\u0000n\u0000l\u0000y\u0000;\u0000 \u0000n\u0000o\u0000 \u0000u\u0000s\u0000e\u0000r\u0000 \u0000a\u0000c\u0000t\u0000i\u0000o\u0000n\u0000 \u0000i\u0000s\u0000 \u0000r\u0000e\u0000q\u0000u\u0000i\u0000r\u0000e\u0000d\u0000.\u0000\r\u0000",
    "@version": "1",
    "@timestamp": "2014-01-31T08:55:03.373Z",
    "type": "SQLServerLog",
    "host": "MyMachineName",
    "path": "C:\\Program Files\\Microsoft SQL Server\\MSSQL10_50.MSSQLSERVER\\MSSQL\\Log\\ERRORLOG"
  },
  "sort": [
    1391158503373,
    1391158503373
  ]
}

I am unsure whether the encoding of the message is preventing Grok from filtering it properly.

I would like to be able to filter these events using Grok and am unsure how to proceed.

Further info:

I created a copy of the log file as UTF-8 and the filter worked fine. So it's definitely a charset issue. I guess I need to determine what the correct charset for the log file is and it should work.

Do you know what encoding the input file is in? You might be able to apply a codec (such as plain) with a custom charset option. Barring that, I found some information on a bug report suggesting some hints about LS's attempts to determine expected encoding from the host machine's environment. If you can't find help, here, the Logstash IRC channel on freenode tends to be handy. — rutter
Thanks for the reply. I added codec and charset and it still is displaying funny. Updated question with changes — Ally Reilly
Having the same issue. SQL Server errorlog is encoding in UTF-16LE, but specifying that as the "charset" parameter in plain codec seems to decode the first line entry, but then fails with { "message" => "??\u3100?????\u3100\u3100?????????????????????????????? } afterwards. Seems like a weird issue within logstash. — test-in-prod

test-in-prod test-in-prod · Accepted Answer · 2015-02-19T17:16:53

So I had the same issue with reading SQL Server log file.

Then I realised that SQL Server will log the same entries to the Windows Event Log, which logstash supports as an input.

SQL Server logs entries with 'MSSQLSERVER' source on my systems. You will need the logstash-contrib package, simply extract the contents over base logstash files on your Windows box (wherever you run logstash to collect data).

I have my logstash agent configured to simply ship the entries to another logstash instance on a linux box that does some other stuff not relevant to this question ;)

Example logstash.conf:

input {
  eventlog {
   type => "Win32-EventLog"
   logfile => ["Application", "Security", "System"]
  }
}

filter {
   if "MSSQLSERVER" in [SourceName] {
     # Track logon failures
     grok {
       match => ["Message", "Login failed for user '%{DATA:username}'\..+CLIENT: %{IP:client_ip}"]
     }
     dns {
       action => "append"
       resolve => "client_ip"
     }
   }
}

output {
  stdout { codec => rubydebug }

  tcp {
    host => "another-logstash-instance.local"   
    port => "5115"
    codec => "json_lines"
  }  
}

Hope this helps.

Issue using grok filter with logstash and a windows file

1 Answers