1
votes

I am having a lot of trouble trying to figure out grok expression for the following message types (coming from a Sophos UTM)

Apr 28 16:57:49 utm-vap-xx.domain.local 2018: 04:28-17:02:05 s-utm-01 httpproxy[52816]: id="0001" severity="info" sys="SecureWeb" sub="http" name="http access" action="pass" method="POST" srcip="10.11.110.5" dstip="216.163.176.36" user="" group="" ad_domain="" statuscode="200" cached="0" profile="REF_DefaultHTTPProfile (Default Web Filter Profile)" filteraction="REF_DefaultHTTPCFFAction (Default content filter action)" size="15" request="0xdae2cc00" url="http://iprep3.t.ctmail.com/SpamResolverNG/SpamResolverNG.dll?DoNewRequest" referer="" error="" authtime="0" dnstime="905" cattime="143" avscantime="2275" fullreqtime="238344" device="0" auth="0" ua="Mozilla/4.0 (compatible; Win32; Commtouch Http Client (curl))" exceptions="" category="178" reputation="neutral" categoryname="Internet Services" country="United States" content-type="text/html" sandbox="-"

The problem arises when I want to skip fields or when value pairs contain empty string. For example in Logstash adding something like

()?(srcip={%"IP:SourceIP"})

is causing problems, while these do work in the online grok builder

The goal is to get something like

Sub
SourceIP
Destination IP
Protocol etc

I am also intending to use the geo-tags in Logstash which I already have working with other sources.

Looking forward to receive some valueable help. Thanks

2
Use the kv filter instead. Use grok just to get the text after httpproxy[52816]:, then use the kv filter on this. The default field and value split should work, but you'll have to use trim_value to remove the " - baudsp

2 Answers

0
votes

Somebody has written a Grok Sophos UTM 9.x Pattern for logstash,

Extremly enhanced Grok Pattern for Sophos UTM ulogd and pluto log messages in various formats and separated the messagebody from the header

 grok {
  pattern => ['(?:%{SYSLOGTIMESTAMP:timestamp}|%{TIMESTAMP_ISO8601:timestamp8601}) (?:%{SYSLOGHOST:logsource}) (?:%{YEAR}): (?:%{MONTHNUM}):(?:%{MONTHDAY})-(?:%{HOUR}):(?:%{MINUTE}):(?:%{SECOND}) (?:%{SYSLOGHOST}) (?:%{SYSLOGPROG}): (?<messagebody>(?:id=\"%{INT:utm_id}\" severity=\"%{LOGLEVEL:utm_severity}\" sys=\"%{DATA:utm_sys}\" sub=\"%{DATA:utm_sub}\" name=\"%{DATA:utm_name}\" action=\"%{DATA:utm_action}\" fwrule=\"%{INT:utm_ulogd_fwrule}\" initf=\"%{DATA:utm_ulogd_initf}\" outitf=\"%{DATA:utm_ulogd_outif}\" (?:srcmac=\"%{GREEDYDATA:utm_ulogd_srcmac}\" dstmac=\"%{GREEDYDATA:utm_ulogd_dstmac}\"|srcmac=\"%{GREEDYDATA:utm_ulogd_srcmac}\") srcip=\"%{IP:utm_srcip}\" dstip=\"%{IP:utm_dstip}\" proto=\"%{INT:utm_protocol}\" length=\"%{INT:utm_ulogd_pkglength}\" tos=\"%{DATA:utm_ulogd_tos}\" prec=\"%{DATA:utm_ulogd_prec}\" ttl=\"%{INT:utm_ulogd_ttl}\" srcport=\"%{INT:utm_srcport}\" dstport=\"%{INT:utm_dstport}\" tcpflags=\"%{DATA:utm_ulogd_tcpflags}\"|id=\"%{INT:utm_id}\" severity=\"%{LOGLEVEL:utm_severity}\" sys=\"%{DATA:utm_sys}\" sub=\"%{DATA:utm_sub}\" name=\"%{DATA:utm_name}\" action=\"%{DATA:utm_action}\" fwrule=\"%{INT:utm_ulogd_fwrule}\" initf=\"%{DATA:utm_ulogd_initf}\" outitf=\"%{DATA:utm_ulogd_outif}\" (?:srcmac=\"%{GREEDYDATA:utm_ulogd_srcmac}\" dstmac=\"%{GREEDYDATA:utm_ulogd_dstmac}\"|srcmac=\"%{GREEDYDATA:utm_ulogd_srcmac}\") srcip=\"%{IP:utm_srcip}\" dstip=\"%{IP:utm_dstip}\" proto=\"%{INT:utm_protocol}\" length=\"%{INT:utm_ulogd_pkglength}\" tos=\"%{DATA:utm_ulogd_tos}\" prec=\"%{DATA:utm_ulogd_prec}\" ttl=\"%{INT:utm_ulogd_ttl}\" srcport=\"%{INT:utm_srcport}\" dstport=\"%{INT:utm_dstport}\"|id=\"%{INT:utm_id}\" severity=\"%{LOGLEVEL:utm_severity}\" sys=\"%{DATA:utm_sys}\" sub=\"%{DATA:utm_sub}\" name=\"%{DATA:utm_name}\" action=\"%{DATA:utm_action}\" reason=\"%{DATA:utm_ips_reason}\" group=\"%{INT:utm_ips_group}\" srcip=\"%{IP:utm_srcip}\" dstip=\"%{IP:utm_dstip}\" proto=\"%{INT:utm_protocol}\" srcport=\"%{INT:utm_srcport}\" dstport=\"%{INT:utm_dstport}\" sid=\"%{INT:utm_ips_sid}\" class=\"%{DATA:utm_ips_class}\" priority=\"%{INT:utm_ips_priority}\" generator=\"%{INT:utm_ips_generator}\" msgid=\"%{INT:utm_ips_msgid}\"|\"%{DATA:utm_pluto_vpnname}\"\[%{INT}\] %{IP:utm_pluto_vpnremoteip} #%{INT}: %{GREEDYDATA:utm_pluto_message}|%{GREEDYDATA}))']
  type => "sophosutm"
 }

this works well with your log (i have tested it)

However, if you are not interested in entire data separated as fields and only interested in specific data as mentioned in your question then you can assign un-necessary data as GREEDYDATA and only extract the desired fields as follows,

sub=%{QUOTEDSTRING:protocol}\s*.*\s*srcip=\"%{IP:SourceIP}\"\s*dstip=\"%{IP:DestinationIP}\"

The above grok pattern will extract, Sub, SourceIP and Destination IP, and produce following output,

 {
  "protocol": [
    [
      ""http""
    ]
  ],
  "SourceIP": [
    [
      "10.11.110.5"
    ]
  ],
  "IPV6": [
    [
      null,
      null
    ]
  ],
  "IPV4": [
    [
      "10.11.110.5",
      "216.163.176.36"
    ]
  ],
  "DestinationIP": [
    [
      "216.163.176.36"
    ]
  ]
}

further data can be filtered using same pattern.

0
votes

Here is what I used to get modsecurity messages to be groked. I have never gone back to simplify the expressions into one, so they are all called out, but this should work for you reverseproxy.log or anything else that is not a straight KV.

if "ModSecurity:" in [message] {
    grok {
    break_on_match => false
        match => [
        "message", ' \[hostname %{QUOTEDSTRING:Hostname}\] \[client %{IPORHOST:Source_IP}\]'
        ] #end match
    } #end grok
     grok {
    break_on_match => false
        match => [
        "message", ' \[client %{IPORHOST:Source_IP}\]'
        ] #end match
    } #end grok
    grok {
    break_on_match => false
        match => [
        "message", ' \[severity %{QUOTEDSTRING:Rule_Severity}\]'
        ] #end match
    } #end grok
    grok {
    break_on_match => false
        match => [
        "message", ' \[id %{QUOTEDSTRING:Rule_ID}\]'
        ] #end match
    } #end grok
    grok {
    break_on_match => false
        match => [
        "message", ' \[uri %{QUOTEDSTRING:Target_URI}\]'
        ] #end match
         add_field => [ "Logsource" , "Reverse Proxy (Modsecurity)" ]
    } #end grok
    grok {
    break_on_match => false
        match => [
        "message", '\[msg %{QUOTEDSTRING:MSG}\]'
        ] #end match
        add_field => [ "received_at", "%{@timestamp}" ]
    } #end grok
    date {
        match => [ "syslog_timestamp", "yyyy:MM:dd-HH:mm:ss" ]
    } #end date
    mutate {
    } # end mutate
  } #end "if ModSecurity:"