Somebody has written a Grok Sophos UTM 9.x Pattern for logstash,
Extremly enhanced Grok Pattern for Sophos UTM ulogd and pluto log
messages in various formats and separated the messagebody from the
header
grok {
pattern => ['(?:%{SYSLOGTIMESTAMP:timestamp}|%{TIMESTAMP_ISO8601:timestamp8601}) (?:%{SYSLOGHOST:logsource}) (?:%{YEAR}): (?:%{MONTHNUM}):(?:%{MONTHDAY})-(?:%{HOUR}):(?:%{MINUTE}):(?:%{SECOND}) (?:%{SYSLOGHOST}) (?:%{SYSLOGPROG}): (?<messagebody>(?:id=\"%{INT:utm_id}\" severity=\"%{LOGLEVEL:utm_severity}\" sys=\"%{DATA:utm_sys}\" sub=\"%{DATA:utm_sub}\" name=\"%{DATA:utm_name}\" action=\"%{DATA:utm_action}\" fwrule=\"%{INT:utm_ulogd_fwrule}\" initf=\"%{DATA:utm_ulogd_initf}\" outitf=\"%{DATA:utm_ulogd_outif}\" (?:srcmac=\"%{GREEDYDATA:utm_ulogd_srcmac}\" dstmac=\"%{GREEDYDATA:utm_ulogd_dstmac}\"|srcmac=\"%{GREEDYDATA:utm_ulogd_srcmac}\") srcip=\"%{IP:utm_srcip}\" dstip=\"%{IP:utm_dstip}\" proto=\"%{INT:utm_protocol}\" length=\"%{INT:utm_ulogd_pkglength}\" tos=\"%{DATA:utm_ulogd_tos}\" prec=\"%{DATA:utm_ulogd_prec}\" ttl=\"%{INT:utm_ulogd_ttl}\" srcport=\"%{INT:utm_srcport}\" dstport=\"%{INT:utm_dstport}\" tcpflags=\"%{DATA:utm_ulogd_tcpflags}\"|id=\"%{INT:utm_id}\" severity=\"%{LOGLEVEL:utm_severity}\" sys=\"%{DATA:utm_sys}\" sub=\"%{DATA:utm_sub}\" name=\"%{DATA:utm_name}\" action=\"%{DATA:utm_action}\" fwrule=\"%{INT:utm_ulogd_fwrule}\" initf=\"%{DATA:utm_ulogd_initf}\" outitf=\"%{DATA:utm_ulogd_outif}\" (?:srcmac=\"%{GREEDYDATA:utm_ulogd_srcmac}\" dstmac=\"%{GREEDYDATA:utm_ulogd_dstmac}\"|srcmac=\"%{GREEDYDATA:utm_ulogd_srcmac}\") srcip=\"%{IP:utm_srcip}\" dstip=\"%{IP:utm_dstip}\" proto=\"%{INT:utm_protocol}\" length=\"%{INT:utm_ulogd_pkglength}\" tos=\"%{DATA:utm_ulogd_tos}\" prec=\"%{DATA:utm_ulogd_prec}\" ttl=\"%{INT:utm_ulogd_ttl}\" srcport=\"%{INT:utm_srcport}\" dstport=\"%{INT:utm_dstport}\"|id=\"%{INT:utm_id}\" severity=\"%{LOGLEVEL:utm_severity}\" sys=\"%{DATA:utm_sys}\" sub=\"%{DATA:utm_sub}\" name=\"%{DATA:utm_name}\" action=\"%{DATA:utm_action}\" reason=\"%{DATA:utm_ips_reason}\" group=\"%{INT:utm_ips_group}\" srcip=\"%{IP:utm_srcip}\" dstip=\"%{IP:utm_dstip}\" proto=\"%{INT:utm_protocol}\" srcport=\"%{INT:utm_srcport}\" dstport=\"%{INT:utm_dstport}\" sid=\"%{INT:utm_ips_sid}\" class=\"%{DATA:utm_ips_class}\" priority=\"%{INT:utm_ips_priority}\" generator=\"%{INT:utm_ips_generator}\" msgid=\"%{INT:utm_ips_msgid}\"|\"%{DATA:utm_pluto_vpnname}\"\[%{INT}\] %{IP:utm_pluto_vpnremoteip} #%{INT}: %{GREEDYDATA:utm_pluto_message}|%{GREEDYDATA}))']
type => "sophosutm"
}
this works well with your log (i have tested it)
However, if you are not interested in entire data separated as fields and only interested in specific data as mentioned in your question then you can assign un-necessary data as GREEDYDATA and only extract the desired fields as follows,
sub=%{QUOTEDSTRING:protocol}\s*.*\s*srcip=\"%{IP:SourceIP}\"\s*dstip=\"%{IP:DestinationIP}\"
The above grok pattern will extract, Sub, SourceIP and Destination IP, and produce following output,
{
"protocol": [
[
""http""
]
],
"SourceIP": [
[
"10.11.110.5"
]
],
"IPV6": [
[
null,
null
]
],
"IPV4": [
[
"10.11.110.5",
"216.163.176.36"
]
],
"DestinationIP": [
[
"216.163.176.36"
]
]
}
further data can be filtered using same pattern.
httpproxy[52816]:, then use the kv filter on this. The default field and value split should work, but you'll have to use trim_value to remove the"- baudsp