Splunk - regex extract fields from source

Question

I am trying to extract the job name , region from Splunk source using regex .

Below is the format of my sample source :

/home/app/abc/logs/20200817/job_DAILY_HR_REPORT_44414_USA_log

With the below , I am able to extract job name :

(?<logdir>\/[\W\w]+\/[\W\w]+\/)(?<date>[^\/]+)\/job_(?<jobname>.+)_\d+

Here is the match so far :

Full match  0-53    /home/app/abc/logs/20200817/job_DAILY_HR_REPORT_44414
Group `logdir`  0-19    /home/app/abc/logs/
Group `date`    19-27   20200817
Group `jobname` 32-47   DAILY_HR_REPORT

I also need USA (region) from the source . Can you please help suggest. Region will always appear after number field (44414) , which can vary in number of digits. Ex: 123, 1234, 56789

Thank you in advance.

Your regex seems quite appropriate for what you achieved. Why can't you develop the last part the same way? Is there a special obstacle which got you stuck? What did you try? How did it fail? — Yunnosch

The fourth bird The fourth bird · Accepted Answer · 2020-08-20T06:24:16

You could make the pattern a bit more specific about what you would allow to match as [\W\w]+ and .+ will cause more backtracking to fit the rest of the pattern.

Then for the region you can add a named group at the end (?<region>[^\W_]+) matching one or more times any word character except an underscore.

In parts

(?<logdir>\/(?:[^\/]+\/)*)(?<date>(?:19|20)\d{2}(?:0?[1-9]|1[012])(?:0[1-9]|[12]\d|3[01]))\/job_(?<jobname>\w+)_\d+_(?<region>[^\W_]+)_log

(?<logdir> Group logdir
- \/(?:[^\/]+\/)* match / and optionally repeat any char except / followed by matching the / again
) Close group
(?<date> Group date
- (?:19|20)\d{2} Match a year starting with 19 or 20
- (?:0?[1-9]|1[012]) Match a month
- (?:0[1-9]|[12]\d|3[01]) Match a day
) Close group
\/job_ Match /job_
(?<jobname>\w+) Group jobname, match 1+ word chars
_\d+_ Match 1+ digits between underscores
(?<region>[^\W_]+) Group region Match 1+ occurrences of a word char except _
_log Match literally

Regex demo

Splunk - regex extract fields from source

1 Answers