1
votes

Trying to capture the timestamp in this log event (for Splunk)

172.21.201.135 | http | o@1I0BTOx1063x3667295x0 | hkv | 2020-06-10 17:43:18,951 | "POST /rest/build-status/latest/commits/stats HTTP/1.1" | "http://bitbucket.my.com/projects/WF/repos/klp-libs/compare/commits" "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36" | 200 | 345 | 431 | - | 5 | 3dk4qm | 

Using the setting TIME_PREFIX, Splunk software uses the specified regular expression to looks for a match before attempting to extract a timestamp.

TIME_PREFIX = <regular expression>  

Default behaviour would be for Splunk to try to get the timestamp from the start of the line, but that is an IP-adress, therefore the need for the regex to match four pipes which is the ...time_prefix.

By using the following regex

(?:[^\|]*(\|)){4}

I want the regex to match on the fourth occurence of the '|', and then stop, non-greedy I guess.

1
You need ^(?:[^|]*\|){4}(?<value>[^|]*), I believe. See regex demo. Or. ^(?:[^|]*\|){4}\s*(?<value>[^|]*[^|\s])Wiktor Stribiżew
Please try it and let know if it works for youWiktor Stribiżew
The first one will match the timestamp, that I do no need, I only need it to stop on the fourth occurence of the pipe, not capture. But - it might be that I actually do not need the regex to stop, checking now if Splunk got what it needed.rhellem
what exactly do you want to extract? you said you want to capture the timestamp in your question, yet in your comment, you say you do not need itChase
Then ^(?:[^|]*\|){4}\s* will do.Wiktor Stribiżew

1 Answers

1
votes

There are two things to consider:

  • Anchor the pattern at the start of the string, else, the environment may trigger a regex search at every position inside the string, and you may get many more matches than you expect

  • When you do not need to create captures, i.e. when you needn't save part of the regex match to a separate memory buffer (in Splunk, the is equal to creating a separate field), you should use a non-capturing group rather than a capturing one when grouping a sequence of patterns.

Thus, you need

^(?:[^|]*\|){4}\s*

See the regex demo showing the match extends to the datetime substring without matching it.

Details

  • ^ - start of string anchor
  • (?:[^|]*\|){4} - a non-capturing group ((?:...)) that matches four repetitions ({4}) of any 0 or more chars other than | ([^|]*) and then a | char (\|)
  • \s* - 0 or more whitespaces.